Jump to content

ShaunR

Members
  • Posts

    4,943
  • Joined

  • Days Won

    308

Everything posted by ShaunR

  1. It was a long shot and I didn't really have any expectation it would help. But it was worth asking the question. As to what I want to do. You will need to download the code in the Queues vs Circular buffer thread. It's fixed in the demo using doubles for the buffer. I need to be able to allow wiring of any data type so I thought, naively, just adapt to type to a variant. It seems to work great as long as you don't de-allocate the pointer . Maybe I don't need to deallocate,it but it "feels" wrong and probably leaks memory like a sieve as well as a high chance that a gremlin is sitting there just waiting to raise his GPF flag..
  2. I thought holidays were for not doing what you do at work I'll still be at this when you get back, so just enjoy your holiday. Most definately. Yes. The timer uses the windows hi-res timer (on windows, of course) but falls back to the ms timer on other platforms. The benchmarks are just to ascertain the execution time of the read and write VIs themselves and what jitter there is from execution to execution. The theory is that because there are only memory barriers and no synchronisation locking between a read and a write, the "spread" should be much more predictable since, with a sizable buffer, access contention is virtually eliminated. In that respect, the queue benchmark isn't really a "use case" comparison, just an examination of the execution time of the primitives vs the dll calls with logic. Don't forget also, that the buffer has inherent feedback in that the writer can slow down if the readers can't keep up. If we were being pedantic, we would also enforce a fixed size buffer so that the write "stalls" when one or more readers can't keep up. I don't think that would be useful however. I'm not sure what you mean here by "buffer solution in a loop". Can you elaborate? Indeed. I'm trying to keep away from external code......at least for now. There is still a lot to do to make it an API. I'm still investigating its behaviour ATM so this thread is really just documenting that journey. If you remember the SQLite API for labview, that's how that started. So right now we have some of the building blocks, but a lot needs to be added/removed/modified before it's ready for "prime-time". The issue is going to be. How do we make it robust without enforcing access synchronisation.
  3. Interestingly. If you try using the GetValueByPointer Xnode, it accepts a variant but produces broken code. Another damn fine plan hits the dust...lol Is the lvimptsl dynamic library documented anywhere?
  4. Now you are showing off If it's really that hard (and undocumented), I'll do something else.
  5. Hey! I'm not the one that can speak every Euro language as well as several computer languages
  6. สิบห้าดอลล่าแพงทำตัวเองดีกว่า
  7. I don't really see why variants need a ref-count since you can disassemble a variant as bytes and reconstitute which would mess up that sort of thing. I read (somewhere) that for handles you have to query the "Master" handle table to get hold of a handle to a particular variable so it is a two step process. So I was thinking perhaps the issue is that I'm actually operating on a master table rather than the real handle. I was kind-a hoping it would just be just a miss-setting in one or more of the CLFNs. Since you haven't proffered a solution or things to try with pointers, I guess it's not trivial and I'll just have to jump down that rabbit hole to see what Alice is up to..
  8. Maybe that's the difference between working with pointers and working with handles? (DSHandToHand etc). Could be that variants are relocatable.
  9. The queue primitives are actually faster (as can be expected). But even a while loop with only the Elapsed time.vi in without anything else yields ~900ns on my machine. If it was one producer and one consumer, queues would be better, but the "spread" or "jitter" is far greater for queues. If you run the tests without the Elapsed Time.vi, you will also see a significant improvement in the buffer-not so much with the queues-so I'm interested to see what AQ comes up with for tests. I think also we will see the true advantage when we scale up to, say, 5-10 in parallel (a more realistic application scenario) especially on quad core machines where thread use can be maximised. But I'll get the variants working first as then it's actually usable. The next step after that is to scale up then add events to the read to see if we can maintain throughput with the better application link.
  10. Yes. As Rolf pointed out. For x32 you need to change the 8 to a 4. But it is the third one that causes the problem sooner or later.
  11. I know. I'm running x64 at the moment so that's why it s 8. Maybe I should have put a conditional disable for people on 32 bit, but it was late at night and I had already pulled most of my hair out. One doc I found states: "LabVIEW stores variants as handles to a LabVIEW internal data structure. Variant data is made up of 4 bytes" And a handle is a **myhandle. I don't really care what the data structure is only that I can point to the handle so it "should" be just pointer copying. Works fine for the write and once for the read, but second time around the deallocation kills LV. So. Suggestions?
  12. I'm probably missing something fundamental about how variants are stored in memory . I want to create an array of pointers to variants. I seem to be able to write quite happily, but when read; it causes a problem. The following VI demonstrates what I am trying to do. It runs through once quite happily, but on the second execution it fails on the DSDisposePointer (even though the variant has been written and read correctly). If you disable the read, then it doesn't crash so it must be something to do with the way I'm retrieving the data. Any help appreciated. varpointers.vi
  13. +1. It's not conclusive between resolution and loop jitter if you check with the elapsed time.vi (that is inside the loop) since it is difficult to separate them (occasionally you see 10s of nsec or psec readings). But as the Tick count2+ uses the same timing method (which yields ps resolution), I think we can assume it is mainly loop jitter that causes the spacing. I think also that whether the Elapsed Time.vi gets executed in the same slice or the next (or the next +1 etc, depending on the schedular) is the reason we see several lines at x*loop jitter.
  14. It's a good job MJE noticed the bug where the dequeues aren't reading everything then. Still. It's rather counter intuitive. Well .remembered. That also explains why the sudden increase (the curved lines around 1ms) in the old 1M data point plots. By the way peeps. Feel free to write your own benchmarks/test software or abuse-ware. Especially if it shows a particular problem. The ones I supplied are really just test harnesses so that I could write the API and see some info on what was going wrong (you can probably see by the latest images that the ones I am using now are evevolving as I probe different aspects). Not much time has been spent on them so I expect there's lots of issues. Eventually, once all fingers have been inserted in all orifices, they will become usage examples and regression tests (if anyone uses the SQLite API For LabVIEW you will know what this means) so if you do knock something up, please post it-big or small. I think I should also request at this point, to avoid any confusion, that you should only post code to this thread if you are happy for your submission to be public domain. By all means make your own proprietary stuff, but please post it somewhere else so those that come here can be sure that anything downloaded from this thread is covered by the previous public domain declaration. Eventually it will go in the CR, and it will not be an issue,but until then please be aware that any submissions to this thread are public domain. Sorry to have to state it, but a little bit of garlic now will save the blood being sucked later.
  15. Indeed. See my post in reply to GregSands about processor hogging.For a practical implementation. Then yes.I agree. It should yield. Adding the wait ms and changing to Normal degrades the performance on my machine by about 30% and as I wanted to see what it "could do" and explore it's behaviour rather than labviews, I wasn't particulalry concerned about other software or parts of the same software. It makes it quite hard to optimise if the benchmarks are cluttered with task/thread switches et al. Better (to begin with) to run it as fast as posssible with as much resource as required to eek out the last few drops, then you can make informed decisions as to what you are prepared to trade for how much performance. Of course, in 2011+ you can "inline". In 2009 you cannot, so subroutine is the only way. Ooooh. I've just thought of a new signature....... "I don't write code fast. I just write fast code
  16. OK. I've managed to replicate this on a Core 2 Duo (the other PC is an I7). The periodic large deviations coincide with the buffer length. If you look at your 10K@ 1Kbuffer, each point is 1K apart. If you were to set the #iterations to 100 and, say the buffer length to 10, then the separation would be every 10 and you would see 9 points [ (#iterations/buffer length) -1 ]. Similarly, if you set the buffer length to 20, you would see 4. My initial thoughts (as seems to be the norm) was a collision or maybe a task switch on the modulo operator. But it smells more like processor hogging as you can get rid of it by placing a wait with 0 ms in the write and read while loops. You have to set the read and write VIs to be "Normal" instead of "subroutine" to be able to use the wait and therefore you lose a lot of the performance so it's not ideal, but it does seem to cure it. I'm not sure of the exact mechanism-i'll have to chew it over. But it seems CPU architecture dependent. Look forward to it. Care to elaborate on for loop Vs While loop? (Don't speed up the queues too much eh? )
  17. Calculating the MD5 Message-Digest of a String or File
  18. Addendum. I've just modified the test to a) ensure timers always execute at pre-determined points in the while loops (connected the error terminals). b) pre-allocate the arrays. So it looks like a lot of what I was surmising is true. There is still one allocation in the 258 iteration image which might be for the shift registers. But everything is a lot more stable and the STD and mean are now meaningful (if you'l excuse the pun). Does anyone want to put forward a theory why we get discrete lines exactly 0.493 usecs apart? (maybe a different number on your machine, but the separation is always constant)
  19. @ GregSands Whilst a median of 4..6 micro seconds isn't fantastic. It's still not bad (slightly faster than the queues). In your images I am looking at the median and Max-Count Exec Times peak. The reason is as will follow. I'm not (at the moment) sure why you get the 300ms spikes (thread starvation?) but most of the other stuff can be explained (I think) I've been playing a bit more and have some theories to fit the facts. The following images are a little contrived since I run the tests multiple times and chose the ones that show the effects without extra spurii. But the data is always there in each run, just that you get more of them The following is with 258 data points buffer. We can clearly see two distinct levels (lines) for the write time and believe me; each data point in each line is identical. There are a couple of data points above these two (5 in fact). Theses anomalous points intriguingly occur at 1, 33, 65,129 and 257 or (2^n)+1. OK. So 17 is missing. you'll just have to take my word for it that it does sometimes show up. We can also notice that these points occur in the reader as well; at exactly the same locations with approximately the same magnitude. That is just too convenient. OK. So maybe we are getting collisions between the read and write. The following is again with 258 iterations with a buffer size of 2 (the minimum). That will definitely cause collisions if any are to be had. Nope. Exactly the same positions and "roughly" the same magnitudes. I would expect something to change at least if that were the issue. So if they really do occur at 2n+1 if we increase further we should see another appear at 513. Bingo!. Indeed. There it is. Here is what I think is going on. The LabVIEW memory manager uses an exponential allocation method for creating memory (perhaps AQ can confirm). So, every time the "Build Array" on the edge of the while loop needs more space, we see the the allocation take place which causes these artifacts. The more data points we build, the more of impact the LabVIEW memory manager has on the results. The "real" results for the buffer itself are the straight lines in the plots which are predictable and discrete. The spurious data points above these are LabVIEW messing with the test so that we can get pretty graphs. We can exacerbate the effect by going to a very high number. We can still clearly see our two lines and you will notice throughout all the images the Median and the Max Count-Exec Times have remained constant (scroll back up and check) which implies that the vast majority of the data points are the same. The results for the mean, STD and to our eyes are "confused" by the number of suprii. So I am postulating that most, if not everything above those two lines in the images is due to the build arrays on the border of the while loops. Of course. I cannot prove this since we are suffering from the "Observer Effect" and if I remove the build arrays on the border, we won't have any pretty graphs . Even running the profiler will cause massive increases in benchmark times and spread the data. I think we need a better bench marking method (pre-allocated arrays?). Of course. It raises the question. Why don't the queues exhibit this behaviour with the queue benchmark?. Well. I think they do. But it is less obvious because, for small numbers of iterations there is greater variance in the execution times and it is buried in the natural jitter. It only sticks out like a sore thumb with the buffer because it is so predictable that they are the only anomalies.
  20. Sweet! I've replaced the global variable with LabVIEW memory manager functions and, as surmised, the buffer size longer affects performance so you can make it as big as you want. . You can grab version 2 here:Prototype Circular Buffer Benchmark V2 I've been asked to clarify/state Licencing so here goes.
  21. Contingent on buffer multiples is a little perplexing. But that it disapers when the buffer is bigger than the points suggests it is due to the write waiting for the readers to catch up and letting LabVIEW get involved in scheduling (readers "should" be faster than the writer because they are simpler but the global array access is slow for the reasons MJE and AQ mentioned - I think I have a fix for that )Can you post the image of it not misbehaving? The graphs tell me a lot about what is happening. (BTW, I like the presentation as a log graphs better than my +3stds-I will modify the tests)
  22. Interesting. You can see a lot of data points at about 300ms dragging the average up whereas most are clearly sub 10us. If you set the buffer size to 101 and do 100 data points, does it improve? (don't run the profiler at the same time, it will skew the results)
  23. OK. Tidied up the benchmark so you can play, scorn, abuse. If we get something viable, then I will stick it in the CR with the loosest licence (public domain, BSD or something) You can grab it here: Prototype Circular Buffer Benchmark It'll be interesting to see the performance on different platforms/versions/bitnesses. (The above image was on Windows 7 x64 using LV2009 x64) Compared alongside each other. This is what we see.
  24. It doesn't tell me when I upload (just says uploading is not allowed with a upload error). When it first happened, I deleted 2 of my code repository submissions just in case that was the problem (deleted about 1.2 MB to upload a 46KB image). It didn't make a difference. There used to be a page in the profile that allowed you to view all the attachments and uploads and monitor your usage. That seems to have disappeared so I can't be sure that deleting more from the CR will allow uploading..
  25. The images are just inserted images that have to reside on another server (using the insert image button in the bar). Usually I upload the images to lavag and insert them. Obviously I cannot do that at the moment so this way is a work-around. I can do something similar for the files I want to upload, but then they won't appear inline in the posts as attachments (and presumably I cannot put stuff in the code repository for people). You will have to be redirected to my download page to get them (not desirable).
×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use.