Jump to content

ShaunR

Members
  • Posts

    4,871
  • Joined

  • Days Won

    296

Posts posted by ShaunR

  1. You Init. Then you Uninit. Then you take that same already-uninitialized pointer block and wire it into a Read or a Write. How does the Read or Write know that the pointer block has been deallocated and it should return an error?

    attachicon.gifScreen shot 2013-07-12 at 10.37.50 PM.png

    As it stands. Yes. The pointer cluster is "on-the-wire" and when we deinit we just leave the cluster alone. But it doesn't have to be (just makes it easier to debug). If I were to start on a "Classic LabVIEW" API, I would probably also shove the pointer cluster into memory and you wouldn't need a "reference" wire at all, hell, even a global variable to contain them (no "not allocated" issues then). With the former, I might use the "CheckPointer" call if it wasn't too expensive and it does what I think it does as belt and braces with a check for null pointer on the cluster (if the cluster doesn't exist then neither do the others and vice versa). But if you are looking at classes I thought you would want to manage all that in the class.....somehow. If I put everything into memory and handle all the scenarios, there isn't much point in a class at all apart from making people feel warm and fuzzy about OOP.

     

    I think the issue you are probably running into is that you are finding the best means to achieve what you need in a class is a DVR. But you can't use them because of the locking overhead. If you find you are looking to a DVR,, then the structure needs to be via the MM functions as they are basically just a method to provide the same function, but without the DVRs locking overhead. Anything else can "probably" be in the class.

     

     

    Nope... there's no registered Booleans to check if the pointers have been deallocated. So to implement this solution, we would have to say that there's an Init but once allocated, the pointer at least to the Booleans needs to stay allocated until the program finishes running. Otherwise the first operation that tries to check the Booleans will crash. Are you ok with a "once allocated always allocated" approach?

     

    I would be "happy for now". I don't think the unallocated pointers are an insurmountable issue and at worst we just need some state flags. I would come back to it on another iteration to see what needs to change and what the performance impact is of a million flags all over the place.

     

    There's still the problem of setting those Booleans. We'll need a test-and-set atomic instruction for "reader is active" -- I don't know of any way to implement that with the current APIs that LabVIEW exposes.

     

    I don't think we do (need a test 'n set). The premise of the pattern is mutual exclusion through memory barriers only. As long as you only have one writer to any location then no test and set is necessary. As we have solved the issue of accessing individual elements in arrays without affecting others or locking the rest of the array, all we need to ensure is that write responsibility is well defined (only one writer to a single location or block). The only time a test 'n set would be required is if we couldn't guarantee atomic reads and writes of the individual bits (PPC?).

    As an aside. Anecdotally, it seems writing all contents to a cluster is an atomic operation in labview and incurs a marginal overhead as opposed to accessing the memory locations independently. The overhead is miniscule in comparison to the extra library calls required to achieve the latter. If this can be confirmed, then it is the fastest way to add atomicity to entire blocks of locations when needed. This is one reason why I use a cluster for the elements in the index array..

  2. Looks good to me.  Any other changes before I make a new VIPM package?

     

    One thing that I haven't gotten round to yet. If you re operating on a large JSON stream, you cannot process any other JSON streams as it seems to beblocking. I think it just needs setting some of the subVIs to re-entrant, but like I said. I haven't gotten round to looking as yet.

  3. A more complete rendition of what's in my head...

     

    You currently have a cluster of pointers. We can move those into a class. We can then make the class look like a reference type (because that's exactly what it is). That's good. And that's all we need to do IF we can solve the "references are not valid any more" problem. The ONLY way I know to do that is with some sort of scheme to check a list to see if the pointers are still valid coupled with a way to prevent a recently deallocated number from coming back into use. That's what LabVIEW's refnum scheme provides. Without such a scheme, a deallocated memory pointer comes right back into play -- often immediately because the allocation blocks are the same size, so those are given preference by memory allocation systems. Thus any scheme like this in my view has to add a refnum layer between it and the actual pointer blocks. The list of currently valid pointers is guarded with a mutex -- every read and write operation at its start locks the list, increments the op count if it is still on the list, releases the mutex, does its work, then acquires the mutex again to lower the op count. The delete acquires the mutex, sets a flag that says "no one can raise the op count again", then releases the mutex and waits for the opcount to hit zero then throws away the pointer block.

     

     

     

    That's my argument why we need a refnum layer. There might be a way to implement this without that layer, but I do not know what that would be.

     

    As you know. Any mutexes and we will be back to square one.

     

    I'm still not quite getting it. Why do we need an "op count"?. All pointers are considered valid until we unregister all readers and writers. Unregistering a single reader amongst multiple readers doesn't mean we need to free any pointers as there are only three (the buffer, the reader index array and the Cursor). The only time we deallocate anything is when there is no longer any readers or writers at which point we deallocate all pointers.

     

    Now. We already have a list of readers (the booleans set to true in the index array) and we know how many (reader count). The adding and removing of readers, i.e. the manipulation of the booleans and the count, is "locked" by the registration manager (non-reentrant VI boundary). So I see the "issue" as how do we know that any asynchronous readers on the block diagram have read whatever flags and exited and therefore the writer can now exit and pointers can be deallocated (the writer only exits when there are no readers (race condition on startup? Will have to play......).

     

    The writer won't read any reader booleans since by this time the registration manager has set the reader count to zero (this order will change with my proposal below since it will exit before it is zero). So it goes into a NOP state and exits. It can set a boolen that says "I have no readers and have exited". The registration manager already knows there are no readers, so it just needs to know the writer has exited before deallocating pointers.

     

    The readers only need to watch their flag to see if the registration manager has changed it and go into a NOP state and exit. At this point I can see there might be a scenario whereby the reader has read it's flag (which is still OK) and by the time it gets to reading buffers and writing indexes the writer has said "I have no readers and have exited". Well. Lets put another flag in the reader index array that says "I'm active" which is just a copy of the registration managers boolean but only written once all memory reads have completed and causes an exit . Now we have a situation where deallocation can only occur if the booleans controlled by the registration manager are all false (or true depending on which sense is safer) AND the "I am active" booleans controlled by the individual readers are all false (ditto sense) AND the writer has said "I have no readers and have exited". This decomposes into just the writer saying "I have no readers and have exited" as the writer can read both fields whilst it is doing it's thing (it has to read the entire block anyway), AND them together  and exit when they are all false (sense again) and set the "I have no readers and have exited".

     

    So in the end-game. The registration manager unsets all the registered booleans, waits for the "I have no readers and have exited" boolean then deallocates the pointers. Does this seem reasonable?

     

    PS: Even if we don't need a refnum layer and we find a way to do this with just the pointers stored in a class' private data, when the wire is cyan and single pixel, many people will still refer to it as a refnum (often myself included) because references are refnums in LabVIEW. The substitution is easy to make. Just sayin'. :-)

     

    Being colour-blind (colour confused is a better term). I really have no opinion on this :)

     

    We offer you the option of crashing because if you're code is well written, it is faster to execute than for us to guard against the crash. You're free to choose the high performance route. ;-)

     

    Well. Get rid of the "Check For Errors" page then :D (I've never had an error out via this route since about LV 7.1)

  4. I mean two calls to Read that both use the same number for the ID input.
     

    Ah. Yes. IC.

    Here's what I think we could do.......

    The registration manager knows how many and which ones are allocated. It will pass back an available ID index to the reader as part of the registration process (the ID is just an offset from the base pointer in # blocks). When there is a registration request, it scans the booleans in the reader index array (no locks required) and passes the first F it comes across. If all are in use, it would increase the size of the array (setting all new values to F and indexes to the current cursor position) then give the next ID to the reader. The reader then uses that ID and starts its counter at the set cursor position. At this point the manager sets the boolean field for the newly registered reader to T and  sets the reader count to +1. The next scan by the write will now iterate through the newly increased size and pick up that the new readers bool is T.

     

     

    I assume that this block of pointers that we're creating would be encapsulated as a single reference datatype that, from the point of view of a user of the API, would look like a single refnum. Put a class wrapper around that cluster of pointers, make the wire 1 pixel cyan. Refnum is simply a shorthand for "cluster of pointer values".

     

    But then there's the question of adding an actual refnum lookup layer, because of this...

    Why can't the pointer cluster actually be the private data of a class? I don't see any reason for refnums if it's going in a class. Instantiating the class creates all the pointers (well. there is no constructor in LV, so until we get my atomic reads and write, they will probably call an init). If the block of pointers you are talking about is to do with the ID. Then we don't need any since the ID number is sufficient and the user will have no idea which IDs are being used as that's internal (as described previously)

     

    How do you keep this VI from crashing?

    attachicon.gifScreen shot 2013-07-12 at 8.22.08 AM.png

    How do you know that a Read is not happening at the same time that Deinit is called? Deinit can't unregister the readers if a read operation is in progress because if it destroys any of the pointer blocks, you'll crash the Read. How does a Read know to return an error if it starts working on a refnum that has been unregistered? If all you do is pass in a block of pointers, those pointers could all be deallocated and, again, you'll crash when you try to use them. This is why LV structures like this go through a refnum layer where we guarantee that a refnum that has been destroyed is not going to come back into use so that a read knows to return an error on a deallocated refnum. I don't see any way to have a separate Deinit function without an actual refnum layer guarding your allocated pointer blocks.

     

    Nothing that I have read about the Disruptor guards against such abusive usages. They assume that you just wouldn't code a call to Deinit while a Read is still running. That's not something that a general API can rationally assume. Now, in some languages, crashing might be a perfectly acceptable answer, with documentation that if you get this crash, you coded something wrong, but we try to avoid that in LabVIEW.

    Well. I'm not part of the "Crashing is an acceptable behaviour" club. They are idiots (although I think you have a couple over there at NI when it comes to CLFNs :D )

    At worst, we could unregister all the readers (set all the booleans to false) then wait 2 weeks to make sure everything has had a chance to read the booleans and exit before we finally crowbar the memory. But I'm sure we can come up with something better than that ;) You're thinking a bit ahead of me at the moment. I tend to think in chunks with a bit of prototyping for feasability (iterative development). I'm only just starting to formulate details about the registration let alone about the API itself.

    What do you suggest?

  5.  

    I couldn't look at your snipit because Lavag seems to be stripping the meta data again and I can't import it as code. But based on your quoted comment........

     

    The varpointers.vi I attached in the first post uses MoveBlock and it works once but dies when deallocating the pointer the second time around because LabVIEW kills the variant at some point or, more probably, I've killed it when I shouldn't have.

  6. Hm... the Uninit is going to cause a race condition problem. If we just deallocate the pointers, everything crashes if there's a read still in progress. Either we add a mutex guard for delete -- which blows the whole point of this exercise -- or delete is something that can only execute when Write has written some "I'm done" sentinel value and the Reads have all acknowledge it.

     

    And we have a second problem of guarding against two writes on the same refnum. They can't be proceeding in parallel but we have no guards against that.

     

     

    Any proposals for how to prevent two writes from happening at the same time on the same refnum? (Or two reads on the same refnum?)

    I don't understand what you are getting at here. Refnums?

    The disruptor pattern can cope with multiple writers (ours currently can't). They use a two stage process where a writer "claims" a slot before writing to it. What's the problem with having multiple readers? (isn't that the point!)

     

    We effectively have a reference counter  (number of readers) and we have a method of skipping in the writer (the bool field of the pointer indexes which the readers could also use so they don't go ahead and read data-the manager only manipulates that field) so we would only deallocate the pointer when the last is unregistered. The deinit would effectively iterate through unregistering all the readers (it becomes an "unregister all readers" then kill pointer rather than just a pointer killer). I think we just need a NOP for the readers (they need to not read until the registration manager has given them an index slot and they know the current cursor position) Don't know. Just thinking on my feet at the moment but "seems" doable and I think it might give us the same sort of feature as the queues when the handle is destroyed (error out)..

     

     

    Is there a maximum amount that the indexes can be negative and he can use even more negative numbers? Or some mathematical game like that?

     

    Not really. I have visions of fixing the i64 cursor wrap-around problem mathematically by using the fact that it goes negative but still increases (gets less negative, minus + minus = plus, sort of thing). As I haven't fixed it yet, you could use a negative number as you will hit the same problem at the same point and everything starts at zero.

     

     

    But you would need to acquire the mutex in every read and every write just in case a resize tried to happen.

     

    The only time when you know only one operation is proceeding is when the buffer is empty (all current readers have read all enqueued items) and the write operation is called (assuming we figure out a way to prevent multiple parallel writers on the same refnum, which is solvable if we can find some way to expose the atomic compare-and-swap instruction, which might require changes for LV 2014 runtime engine). That's the only bottleneck operation in this system. In that moment, the write operation could resize the read buffer and then top-swap a new handle for the read indexes into a single atomic instruction into the block of pointers. So if you have the Write VI able to have a mode for Resize (and likely for Uninit, as I mentioned in my previous post) then you could resize.

     

    I don't see any other way to handle the resize of the number of readers. Anyone else attempting to swap out the pointer blocks faces the race condition of someone trying to increment a block and then having their increment overwritten by a parallel operation.

     

    Well. I think we need to look at the Disruptor code again to see how they handle it (they probably have the same problem). If we can't think of a way to be able to reorganise, we can at least dynamically increase as that doesn't overwrite existing indexes; just adds new ones and disables old ones (haven't seen a realloc but have seen functions to resize handles......somewhere). So we could create, say 5 at a time and increase in blocks. We are back to the MJE kind of approach which I first outlined with the flags then. (I did say IF we were smart :D )

     

    I think this is where we need to revisit the disruptor pattern details. They talk about consumer barriers, claiming slots and getting the "highest" index rather than just seeing if it is greater (as we currently do). I think we have the circular buffer nailed. It's the management and interfaces we are beginning to brainstorm now and they must have solved these already.

  7. With Version 3 my dual-core machine gives the following result when the Buffer size is smaller than the number of iterations:

    attachicon.gifCB Test BUFFER_FP.png

     

    There is still a long wait at an interval determined by the buffer size, but it doesn't usually occur from the start of the run.  Both the iteration time and its variability seem to reduce once the long waits start.

     

    In the writer there is a disable structure. If you disable the currently enabled (and enable the other) it will use the previous index reading method. Does this affect the result you see?

  8. So I have been trying to get the OpenG zip library working on a cRio (VxWorks) and I think I have finally got it to work but not before finding a nasty little bug. If any of the file path controls use an uppercase 'C' for the drive letter (like "C:ni-rtsystem") the zip functions will fail to find the files in question (The open/create returns an error code of 7). Now maybe I am missing something but this took me about an hour to diagnois so I thought I would share incase anyone else runs into this problem they will know. I have no idea if the bug is a problem on any other systems (windows or Pharlap) but I suspect it is not. Hopefully it is an easy fix in the code.  Thanks for getting zip functionality to work on the Crio!

     

    Stephen

     

    VxWorks paths are case sensitive (windows aren't by default),

  9. I have a half-implemented version that does something similar. It works by pre-allocating an array of reader cursors representing the positions of each reader. The array size represents the max number of registered readers. All the cursors start as negative indicating nothing is registered for that index. The writer init returns with no registered readers. Reader initialization/registration involves finding an available cursor (negative), setting it to a valid starting value (either zero or some offset depending on where in the lifecycle the writer is). This has forced init/release mechanisms to use a lock to ensure atomic operations on the cursor array to avoid race conditions while doing the read-write sequence but looping on the cursors during enqueue/dequeue operations can happen without a lock as in Shaun's examples. Releasing a reader involves setting it's cursor negative, allowing that cursor index to be reused if necessary.

    You can't use a negative number with this because the indexes can be negative (strange, I know, but it is to do with wrap-around).

     

    I'm thinking that we could have a boolean with the R and Cnt (a block is converted to a cluster for processing-see the CB GetIndexes3.vi) which tells the index iterator (in the write) whether that index should be considered in the "lowest" check. This would have very little impact on performance (read a few more bytes in the block at a time and 2 AND operators.).

     

    Then you would need a "registration manager". That would be responsible for finding out how many are currently registered, updating the registered count and choosing a slot which has a F flag to give to the reader that wants to register (not disimilar to what you are currently doing). The additional difference would be that it is also capable of resizing the array if there are not enough index slots It could even shrink the array and reorganise if we wanted to be really smart so that the index iterator doesn't process any inactive slots and do away with the choosing of an inactive slot altogether. This only ever needs to happen when something registers/unregisters so even if this is locking, it will not have much of an impact during the general running (now we are starting to get to the Disruptor pattern ;) )

  10. Still need to propose a solution for handling abort. The DLL nodes have cleanup call backs that you can register, but that's not going to help us here. If you're going to do it in pure G, I think you have to launch some sort of async VI that monitors for the first VI finishing its execution and then cleans up the references if and only if that async VI was not already terminated by the uninit VI. Either that or the whole system has to move to a C DLL, which means recompiling the code for each target. Undesirable. Anyone have a better proposal for not leaking memory when the user hits the abort button?

     

    Yeah. Don't want to go to a DLL. Been there, done that, torn holes in all my T-shirts. Is there no way we can utilise the callbacks (I thought that was what the InstanceDataPointer was for).

     

    I do.

     

    When you do Init, wire in an integer of the number of readers. Have the output be a single writer refnum and N reader refnums. If you make the modification to get rid of the First Call node and the feedback nodes and carry that state in the reader refnums, then you can't have an increase in the number of readers.

     

    That wouldn't work well for my use cases since I don't know how many I need up front (could be 10, could be 30 and may change during the program lifetime as modules are loaded and unloaded->plugins).

    The rest (about first call etc) is valid. It really needs to be in the next layer up but as that doesn't exist yet, its in the current read and writes. What we (or more specifically, I) don't want is for them to be reliant on shift registers in the users program to maintain the state info. How that is handled will depend whether the next layer is class based or classic labview based which is why I havn't broached it yet. At that layer, I don't envisage a user selectable "Init". More that the init is invoked on a read or write depending on what gets there first (self initialising). I ultimately want to get to a point where you just slap a write VI down and slap read VIs wherever you need them (even in different diagrams) without having to connect wires (see my queue wrapper vi for how this works, although not sure how it will in this case, ATM, since the queue wrapper uses names for multiple instances).

  11. Well dynamic registration, unless you forbid a once registered reader to unregister, makes everything quite a bit complexer. Since then you can get holes in the index array that would then block the writer at some point or you have to do an additional intermediate refnum/index translater that translates the static refnum index that a reader gets when registering into a correct index into the potentially changing index array. I'm not sure this is worth the hassle as it may as well destroy any time benefits you have achieved with the other ingenious code. :D

     

    Indeed.I know the issues and have some ideas (easiest of which is to have a boolean in the index array cluster alongside the R and Cnt). Do you have a suggestion/method/example ?

  12. ShaunR: I'd like you to try a test to see how it impacts performance. I did some digging and interviews of compiler team. In the Read and the Write VIs, there are While Loops. Please add a Wait Ms primitive in each of those While Loops and wire it with a zero constant. Add it in a Flat Sequence Structure AFTER all the contents of the loop have executed, and do it unconditionally. See if that has any impact on throughput.

    Done. MJE mentioned this previously and, until now I have resisted since I was more concerned with absolute speed performance. But it has nasty side effects on thread constrained systems (Dual cores etc) unless they can yield. You lose performance of using subroutine on constrained CPUs and spread obviously increases as we give labview the chance to interfere. But nowhere near as much as the queue or event primitives. So you see an increase in mean and STD but the median and quartile (which I think are better indicators as they are robust to spurii) remains pretty much unchanged.

     

    I've also now modified the index reader in the write to be able to cope with an arbitrary number of multiple readers. We just need to think how we store and manage registration (a reader jumps on board when the writer is halfway through the buffer, for example).

     

    Also added the event version of the test harnesses.

     

    Version 3 is here. After this version I will be switching to a normal versioning number system (x.x.x style) as I think we are near the end of prototyping.

  13. Ok... the hang when restarting your code is coming from the First Call? and the Feedback Nodes. I'm adding allocation to your cluster to store those values. It needs a "First Call?" boolean and N "last read index" where N is the number of reader processes.

     

    The first call (in the write) is because when everything starts up all indexes are zero. The readers only continue when the write index is greater so they sit there until the cursor index is 1. With the writer, it has to check that the the lowest reader index.isn't the same as its current write. At start-up,, when everything is zero, it is the same therefore if you don't have the First Call, it will hang on start. Once the writer and reader indexes get out of sync, everything is fine (until the I64 wraps around of course-needs to be addressed). If you have a situation where you reset all the indexes and the cursor back to zero AND it is not the first call; it will hang as neither the readers or the writer can proceed.

  14. I was going to suggest using a User Event, which which I'd always thought of as a One-to-Many framework.  Just tried it, and this "works" with a time of 500ms, and with proper parallel execution, but Profiling shows that the data is copied on one of the event structure readers.  It looks as though multiple registrations sets up separate event "queues", and Generate User Event then sends the data independently to each one.  I had never realised it behaved like this, though thinking about it now, I guess it's not surprising.

     

    Indeed. Events are a more natural topological fit (as I think I mentioned many posts ago). Initially, I was afraid that the OS would interfere more with events than queues (not knowing exactly how they work under the hood, but I did know they each had their own queue). For completeness, I'll add an event test harness so we can explore all the different options.

     

    bve.png

  15. The processes are parallel -- the computation done on each value is done in parallel with the computation done on the other value. The only reason you can make the argument that they aren't parallel is because you have contrived that one operates on the odd values and the other operates on the even values. If you have a situation like that, just stick the odd values in one queue and the even values in another queue at the point of Enqueue. Now, I realize you can make the system "both loops execute on 2/3rs of the items, with some overlap" or any other fractional division... as you move that closer to 100%, the balance moves toward the copy being the overhead. My argument has been -- and remains -- that for any real system that is really acting on every value, the single queue wins.

     

    If you have two processes that have dissimilar execution, then just arrange the first one first in a cascade, like this (which does run at 500 ms): attachicon.gifAQ1_CascadeQueue.vi

    I disagree. The processes are most certainly not parallel,even though your computations are (as is seen from the time).

    In your second example you are attempting to "pipeline" and now using 2 queues (I say attempting becauseit doesn't quite work in this instance).

     

    You are

     

    a) only processing 100 values instead of all 200.(true case in bottom loop never gets executed) 

    b) lucky there is nothing in the other cases because pipelining is "hybrid" serial (they have something to say about that in the video)

    c) lucky that the shortest is last (try swapping the 5 with the -1 so -1 is top with a) fixed->back to 600ms)

    d) No different to my test harness (effectively) if you place the second enqueue straight after the bottom loops dequeue instead of after the case statement (which fixes c).

  16. I think two issues are being conflated here. Everything AQ posted with respect to objects is of course correct. However regardless of the nature of an object's structure the data needs to be brought into some scope for it to be operated on. It's that scope that's causing the problem.

     

    Using a global variable will demand copying the whole object structure before you can operate to get your local copy of the single buffer element.

     

    Using a DVR or a FGV demands an implicit lock, either via the IPE required to operate on the DVR or the nature of a non-reentrant VI for the FGV. So while a class does not have any built in reference or exclusion mechanics, pulling that data into some useful scope such that it can be operated on does.

     

    The same issue is what has prevented me from posting an example of how to pull a single element from a buffer without demanding a copy of the entire buffer or running through some sort of mutual exclusion lock. Short of using the memory manager functions as Shaun has already demonstrated I don't see how to do it. I know there are flaws with the memory manager method, I just don't see an alternative without inducing copies or locks.

     

    Well. You have managed to concisely put into a paragraph or two what I have been unsuccessfully trying to get across in about 3 pages of posts :yes: . (+1).

  17. I'm still not sure this buys us anything in terms of your MoveBlock() (or SwapBlock()) attempt since the variant obviously needs to be constructed somewhere. 

    Indeed. However a variant is constructed (constant on the diagram), but from what AQ was saying about descriptors, a simple copy is not adequate since the "void" variant has the wrong ones. I'm now wondering about the LVVariantGetContents and LVVariantSetContents which may be the correct way to modify a variants data and cause the descriptor to be updated appropriately.

  18. Yes. Doh!

     

     

    That's why you do a top-swap instead of a move block. There's one allocated inside the variant in the first place and you swap the [pointers/top-level data structure which may our may not contain pointers depending upon the data type under discussion] so that the buffer still contains one.

    Hmmmm. Not sure what this "top swap" actually is. Is that just swapping handles/pointers so basically the pointer in the buffer then points to the empty variant for a read? That would be fine for a write, but for a read the variant needs to stay in the buffer.

    Can you demonstrate with my original example?

×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use.