Jump to content

ShaunR

Members
  • Posts

    4,883
  • Joined

  • Days Won

    297

Everything posted by ShaunR

  1. +1. Plenty for me to play with here. Ta very muchly (did you mean 4 or 8 bytes for the pointer size rather than 8 or 16?)If the variant claims right of destruction, what happens to the copies in the buffer? Is this the reason why it crashes on deallocation?
  2. I look at it very simplistically. The private data cluster is just a variable that is locked so you can only access it via public methods. FGV is also a variable that is locked so you can only access it with the typdef methods. In my world. There is no difference between a class and an FGV as a storage mechanism. I've got no idea what labview thinks (that's why your input is invaluable). I just want it to do what I need. If labview requires me to make a copy of an element so it is a data copy rather than shared, then I'm ok with that (LVCopyVariant, LVVariantGetContents, LVVarintSetContents et. al. docs would be useful ). But I'm not ok with a copy of an entire array for one element that causes buffer-size dependant performance. The only reason I have gone down this path is because a global copies an entire array so I can get to one element. If you have another way without using the LVMM functions then I'm all ears. I just don't see any though (and I've tried quite a few). Throw me a bone here eh?
  3. Indeed. But it seems to lock the entire array while it does it. When I experimented, it one of the slowest methods. Ahhhh. Those were the days [misty, wavey lines]. When GPFs where what happened to C programmers and memory management was remembering what meetings to avoid. Where is this documented about SwapBlock (or anything for that matter)? I couldn't even find what arguments to use. I also found a load of variant functions (e.g. LVVariantCopy) but have no idea how to call them.
  4. Now you have switched to byte arrays; you don't need a terminating string. But we still need to get rid of the flatten and unflatten which are too slow.
  5. Well. You answered that in a previous post. If you remember, there was a significant change in times for different sizes of buffer-the bigger it got, the worse it was. As you pointed out. That was because the global variable array forced a copy of the entire array to access a single element. So although you are correct in that with native labview code you can "get" a single element. In reality, a copy of an entire array is required to get it (the only scenario where this isn't the case is where labview uses the "sub-array" wire). The latest version allows a buffer size as large as you want with no impact. So if you want a practical visualisation. Load up the first example and set the buffer to 10,000 and compare it to the second example with a buffer of 10,000. Agreed. Shame though You can try this yourself. Set all the read and write polymorphic instances to non-reentrant.(the write/read double, write read index etc inside the read and write VIs) You've hit an important point here though. Scalability. This scales really, really well. Make the number of readers, say, 10 and there is a marginal increase in processing time. For queues, it is a fairly linear increase. Oh. I don't know. Now that you have the buffer that doesn't need DVRs, globals, LV2 globals or singletons, it can be put into a class. Basically the incrementing counter (the feedback node in the reads) just needs to be in your private data cluster. Then you would be in a good position to figuring out how to manage the reader registration (The ID parameter) That sort of stuff is only just on the edge of my radar ATM. But nothing to stop you coming up with the class hierarchy for an API since you don't have to worry about locking/contention at that point.
  6. This is exactly what is being achieved with the LV MM functions although we cannot get reference access, only a "copy from" since to get back into labview we need "by value."
  7. Indeed it is. But it is what makes it usable. Bingo! No. We don't need protected read/modify/store IF we are using the LV memory manager functions. The reason being is that we can pick and chose individual elements of arrays to update. you cannot do this in labview without reading the whole array, modifying and then writing the whole array. This is why I'm not worried about registration. Originally I had an array for the indexes in the global variable. But the obvious race conditions (read/modify/write) caused me to revert to single indexes fixed to the number of readers I was testing so I could get on and benchmark. It is/was an interim solution as, at the time, it wasn't clear whether it was worth spending the time working out a registration scheme if the performance was atrocious. Now I'm not using a global. This is a no-brainer and doesn't require additional logic, just a pointer to an array of indexes. The writer only needs to know how many are registered and then can do an array min on all of them. I'm thinking of 2 arrays, one for the readers' counts and one for the readers' R indexes as the two calls are probably more efficient than one call to a 2d flat array with all the gubbins required to extract the dimensions.(That's just a gut feeling). This part could, of course, also be done in native labview. It's the readers that are the problem....... Because the writer only reads the readers indexes and the readers only write to them AND the only important value is the lowest; if it is possible to update a single location in the array without affecting others, then no locking is required at all (memory barriers are sufficient) and no read/modify/write issues arise. Moveblock enables this but any native LabVIEW array manipulation will not (AFAIK). cmpexch (CAS) is an optimised CPU instruction so the processor needs to support it (if it's INTEL, it's a given, PPC - not so sure). I was hoping this was what SwapBlock used as it could potentially be more efficient than moveblock. But the only reference I can find to it was on a German forum and you provided an answer . I can't see any other reason for SwapBlock other than to wrap cmpexch The only people that can answer that question is NI, really. For my purposes, I don't really need the compare aspect, but an in-place atomic exchange would be useful for the indexes.
  8. Registering etc isn't a problem. The problem is the access to the indexes and the buffer without locking (mutexes etc) which is where the pattern derives its performance gains. My first choice was a class since the buffer and/or the reader/writer indexes can be held in each instance. However, the locking overhead around the private cluster coupled with atomicity of the private data clster means the performance degrades significantly (we are talking ms rather than us). You need a way of having indexes and buffer controls being able to be accessed independently (e.g. you cannot have them all in one cluster) and break access restrictions so that a reader can simultaneously read data when a writer is writing without waiting (which you cannot do if you wrap anything in a non-reentrant VI). You end up in a kind of no-mans land where you need a global storage (i.e. you need something like a DVR or LV2 global) with no locking mechanisms (requires reentrant like access behaviour). The closest native labview object that fulfills that is the global variable. If someone else has an alternative. I'm all ears.
  9. It's just too slow to serialise (~10x slower).
  10. PXI comes in windows or RT flavours. RT (aka Pharlap ETS) is a cut-down windows kernel so for your purpose it makes no difference. VxWorks is for Power PC platforms so you will only see that in [some] CRIOs or Fieldpoint units. I wouldn't worry about it. I was just pointing out that byte alignment padding is not the same across platforms and assuming clusters are byte aligned can yield unexpected results on them.
  11. Only on windows. Mac and linux is 4-byte boundaries and VXworks is 8 byte boundaries.
  12. Nope. The first release was using a global variable and an array index. That was after looking at classes, DVRs and LV2 globals, Most work OK until you get to the writer needing to know the index of all the readers Do you have something specific in mind?
  13. Well. It's cheap to wire anything to it. What about your "Generic" terminals?
  14. Good get-out clause Yes. Variants I think are the prime candidate for LV. I don't think we need to "allow" it at all. They have certain implementation aspects that address atomicity, but we don't suffer from it because of LabVIEW. So we can ignore those aspects as long as it doesn't affect performance. What we do need is to be able to read a single or multiple elements from an array without copying the entire array, hence the move to LV memory manager functions. Again. I don't think a new primitive is needed although a DVR would have been the first choice had it not come with all the baggage. It is as close to pointer referencing as we get and that is what we need. A global variable has most/all the features we need which you are describing here, without all the external signalling. It just comes with the baggage of copying entire arrays. Works great for <500 elements, but too much copy overhead for larger buffers. Don't forget, that a writer can't write whilst a reader is reading by the design (the counters), not by the storage locking mechanism. I think we can do everything with a bit of lateral thinking with what we have rather than require specialised semantics. See my previous paragraph. Unnecessary I think. I think the idea that the buffer "contains" the data object is perhaps misleading. I am expecting the buffer to contain pointers to the objects and the reader just de-references (if we can get away without a copy, even better). So you create your variants then stick the addresses in the buffer. The writer can change the variants by the normal labview methods. The reader can read via the normal LV methods (after de-referencing). The counters make sure the writer doesn't update a variant that hasn't been read yet. All is happy in R&D because nothing ever gets stomped on to require management No idea what she means there. It looks to me to be the very heart of the disruptor pattern unless she is using the name to reference the resource access and boundary contracts. But they wouldn't work well with anything else but a circular buffer as as far as I can tell. I see the elegance in the pattern in the way the writer and readers are able to circumvent lock contention (on a circular buffer) with simple counters and memory barriers. The usual method is to dollop large quantities of mutexes around.
  15. Hmm. I still don't think you've quite grasped it, but I think we are nearly there . You are "averaging" a total time. That's not what we want to do as it doesn't tell us anything meaningful. With the default values of your benchmarks; we see some straight lines until the buffer is full in both. But what are the values you see periodically in the buffer @ about 1us that do not appear in the queue example? What is going on there? Data being processed? (you betcha) To demonstrate with your new benchmark, set the buffer size to 2 (This scenario is more exemplified for, say, an FPGA or RT where you don't have GB of memory for your queue-fixed sized array ring a bell?). You will notice there is not much difference in your "average times" and the queue write gets into its stall-state much more quickly but everything else is pretty much the same. But look at the graphs and the median of the buffer. They are definitely not equivalent even though the "average times" say they are. .You have not been kind to queues in this example since the write will block until the largest process time has executed (once the buffer is full) before moving to the next. The buffer doesn't suffer as much from this limitation and my benchmark was so that queues didn't suffer from it either. You asked me previously what time was I trying to optimise (which I erringly failed to answer-apologies). Well. The time I'm trying to get below is 1usec for the execution of a VI - the buffer access time. I am, at the moment, comparing the execution time of the VIs I've written with the queue primitives (which are compiled code). The next thing I am looking at is the "jitter" and then trying to identify any blocking modes which may indicate resource contention locking. This was the reason for moving from the global to the pointers since a read of the global array locks the writer (albeit transparently) and forces a copy of the entire array. Once I can figure out variants (IF I can figure out variants) then I'll start looking at 5-10 readers with one writer (that's going to be a tedious weekend )
  16. It was a long shot and I didn't really have any expectation it would help. But it was worth asking the question. As to what I want to do. You will need to download the code in the Queues vs Circular buffer thread. It's fixed in the demo using doubles for the buffer. I need to be able to allow wiring of any data type so I thought, naively, just adapt to type to a variant. It seems to work great as long as you don't de-allocate the pointer . Maybe I don't need to deallocate,it but it "feels" wrong and probably leaks memory like a sieve as well as a high chance that a gremlin is sitting there just waiting to raise his GPF flag..
  17. I thought holidays were for not doing what you do at work I'll still be at this when you get back, so just enjoy your holiday. Most definately. Yes. The timer uses the windows hi-res timer (on windows, of course) but falls back to the ms timer on other platforms. The benchmarks are just to ascertain the execution time of the read and write VIs themselves and what jitter there is from execution to execution. The theory is that because there are only memory barriers and no synchronisation locking between a read and a write, the "spread" should be much more predictable since, with a sizable buffer, access contention is virtually eliminated. In that respect, the queue benchmark isn't really a "use case" comparison, just an examination of the execution time of the primitives vs the dll calls with logic. Don't forget also, that the buffer has inherent feedback in that the writer can slow down if the readers can't keep up. If we were being pedantic, we would also enforce a fixed size buffer so that the write "stalls" when one or more readers can't keep up. I don't think that would be useful however. I'm not sure what you mean here by "buffer solution in a loop". Can you elaborate? Indeed. I'm trying to keep away from external code......at least for now. There is still a lot to do to make it an API. I'm still investigating its behaviour ATM so this thread is really just documenting that journey. If you remember the SQLite API for labview, that's how that started. So right now we have some of the building blocks, but a lot needs to be added/removed/modified before it's ready for "prime-time". The issue is going to be. How do we make it robust without enforcing access synchronisation.
  18. Interestingly. If you try using the GetValueByPointer Xnode, it accepts a variant but produces broken code. Another damn fine plan hits the dust...lol Is the lvimptsl dynamic library documented anywhere?
  19. Now you are showing off If it's really that hard (and undocumented), I'll do something else.
  20. Hey! I'm not the one that can speak every Euro language as well as several computer languages
  21. สิบห้าดอลล่าแพงทำตัวเองดีกว่า
  22. I don't really see why variants need a ref-count since you can disassemble a variant as bytes and reconstitute which would mess up that sort of thing. I read (somewhere) that for handles you have to query the "Master" handle table to get hold of a handle to a particular variable so it is a two step process. So I was thinking perhaps the issue is that I'm actually operating on a master table rather than the real handle. I was kind-a hoping it would just be just a miss-setting in one or more of the CLFNs. Since you haven't proffered a solution or things to try with pointers, I guess it's not trivial and I'll just have to jump down that rabbit hole to see what Alice is up to..
  23. Maybe that's the difference between working with pointers and working with handles? (DSHandToHand etc). Could be that variants are relocatable.
  24. The queue primitives are actually faster (as can be expected). But even a while loop with only the Elapsed time.vi in without anything else yields ~900ns on my machine. If it was one producer and one consumer, queues would be better, but the "spread" or "jitter" is far greater for queues. If you run the tests without the Elapsed Time.vi, you will also see a significant improvement in the buffer-not so much with the queues-so I'm interested to see what AQ comes up with for tests. I think also we will see the true advantage when we scale up to, say, 5-10 in parallel (a more realistic application scenario) especially on quad core machines where thread use can be maximised. But I'll get the variants working first as then it's actually usable. The next step after that is to scale up then add events to the read to see if we can maintain throughput with the better application link.
  25. Yes. As Rolf pointed out. For x32 you need to change the 8 to a 4. But it is the third one that causes the problem sooner or later.
×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use.