Jump to content

LVOOP with DVRs to reduce memory copies - sanity check


Recommended Posts

Posted

[cross posted to forums.ni.com]

 

Greetings LVOOP masters. I realise this is not a simple question but I would really appreciate your opinion please.

 

I'm trying to make some data handling improvements to a test sequencer application that I've written.
User defined tables determine how much data is acquired. Long tests can easily generate 100's of MB's of data.
Collecting, analyzing, reporting and saving that data tends to create copies and we can easily run out of memory.
It's currently written without any LVOOP methodology but that is about to change. I'm new to OOP but have been reading alot.

I have 2 objectives in refactoring the data handling VIs:
1) Enable different measurement types to be collected, analyzed and saved according to their unique properties.
Presently any measurement type that isn't a dbl waveform is shoehorned into one and treated exactly the same. XY style data is resampled losing resolution and digital data is converted to dbl. This results in a lot of extra data being generated for no good reason other than to fit in a waveform.

2) Reduce data copies. Because the measurements are shuffled around from one module to another the copies become a problem. I'm hoping to use DVRs to the objects and pass around the references instead, reducing copies.

I'm quite comfortable with objective 1, it's number 2 I'm not sure about.
After reading this excellent whitepaper here I feel I need to get some expert advice on whether my approach is correct. In particular these two statements have raised questions in my mind:

"Be aware: Reference types should be used extremely rarely in your VIs. Based on lots of looking at customer VIs, we in R&D estimate that of all the LabVIEW classes you ever create, less than 5% should require reference mechanisms."

"LabVIEW does not have "spatial lifetime". LabVIEW has "temporal lifetime". A piece of data exists as long as it is needed and is gone when it is not needed any more."

Q1. With that in mind, will all my effort to use DVRs actually reduce data copies significantly in my case?
I plan to have a single FGV VI where the objects are created and stored in an array. Then when other modules need access to the data, a DVR is created and handed out. Once these modules are finished with the captured data, they flag it as deletable and the measurement FGV deletes the DVR and removes the object from the array.

Q2. Is that the right way to go about it or am I missing the point of LVOOP and DVRs?


Below is a simplified mockup of the way measurement data is currently used in the application.

LVOOP Experts: Your feedback/suggestions are much appreciated.

 

LVOOP DVR Measurement Optimization.zip

post-11537-0-70468800-1379028328_thumb.p

Posted

Well, to prevent copies you must serialize access to the data.  A DVR does that, but if you have already designed a structure that serializes access (Collect—>Analyze—>Graph—>Store) then you won’t gain anything by adding a DVR.  

 

BTW, an easy way to reduce copies in your example code is to put the analysis logic inside the Functional Global (an “analyze" step rather than “get”).  You can also serialize the graphing and saving.

 

A more major change is to not deal with all the data at once.  Stream data to disk, and only read, analyze and present a subset at a time.

Posted (edited)

I would turn your aquisition on its head.

Stream directly to the TDMS DB and just query it when you want to do analysis. You would then only have the data in memory that you need and no copies.

Edited by ShaunR
  • Like 1
Posted

I know you are asking if DVRs and LVOOP, but my first reaction was - why the nested while loops, why a functional global (and, if used, why not put the analysis and report functionality within it, at least that would eliminate some data copies), - why build the measurement array one element at a time (and if each measurement can be 100's of MB - perhaps it would be more efficient to put the temporary data on disk instead of in memory...) - and why not use for-loops instead, with auto-indexing...

 

If you really can wait for all the measurements to be done before doing the analysis, you can skip the functional global and instead pass an array by wire. Use for loops with auto-indexing to make the indexing and memory allocation automatic.

 

Building arrays one element at a time in a loop is costly, both in memory and speed (every execution of the build function triggers a data copy operation that just becomes larger and larger the bigger the array gets). In your case it is *very* costly due to the sizes involved. The most memory efficient way to build an array is to pre-initialize it to its final size prior to filling in the results (using the replace element function then, not insert) it with data. If the data type has a fixed footprint (i.e. not an array of variable length e.g.) LabVIEW can do it for you (and with the best performance) if you use a for-loop with auto-indexing on. If the footprint is unknown, but you have an idea of its upper bound at least - initialize an array to the maximum size, then scale it down after filling in the measurements (or up again by a factor if found to be too small somewhere in the process). Alternatively you can write the measurements to disk (unless that's too slow), then pass a list of paths or file references to the analysing function.

 

If the analysis can and *has* to run in parallel with the measurements (to avoid halting the measurements), use the same model as has been done here to pass report data; use a queue (producer-consumer model). Perhaps the report loop can be merged with the analysis loop,or is there a need to analyse any quicker than you can report?

 

All of this applies even if you make a measurements class and have different subclasses for each type of measurement (or type of output from a measurement...).  

  • Like 1
Posted
I would turn your aquisition on its head.

Stream directly to the TDMS DB and just query it when you want to do analysis. You would then only have the data in memory that you need and no copies.

 

+ 1

 

Stream your test data to disk.  That way you never have to worry about running out of memory.  

 

There are many file types to choose from, each with their own benefits and drawbacks:  TDMS, datalog, CSV, databases, etc.  TDMS is great for concurrent read/write access.  Datalogs make great object based config files.

  • Like 1
Posted (edited)

Thank you all for your valuable input. I'm glad I asked here. So far no replies on forums.ni.com.

put the analysis logic inside the Functional Global...
why not put the analysis and report functionality within it,

Good suggestion. We are already doing the analysis part for one measurement type within the FGV. The reporting function writes xml, saves graph images, saves the tdms file and then zips it. This is done in a separate low priority thread so the next test can be started without having to wait. The report target folder could be on a network share slowing down the process. That is why I kept it separate. If I use a temporary storage TDMS file it will have to be on a local drive to keep it fast.

 

Stream data to disk, and only read, analyze and present a subset at a time.
Stream directly to the TDMS DB and just query it when you want to do analysis.
Stream your test data to disk.

How can I ignore this suggestion!
Currently I only store data to the TDMS file depending on user settings: All, Failures only, or none.
I guess I can store it all > extract out the ones I need to analyse > delete the ones I don't need to keep > zip it > move it to the report folder.

 

why the nested while loops, why a functional global... why build the measurement array one element at a time... and why not use for-loops instead, with auto-indexing...

The application is a lot more complex than the example I posted. I was attempting to demonstrate how I believe the data copies were occurring.
In the example, nested while loops indicate the way that the application allows a user to load multiple test sequences and how it can cycle through multiple devices under test. Also how certain conditions can trigger different sequences to load (chamber temperature, run count etc. etc.).
In the example the measurement array is stored in a functional global to try and reduce copies instead of passing wires through the application. It is built one measurement at a time to indicate how 15 different modules can all add items to the measurement array depending on what happened during the test. In reality they each can add an array of measurements. How many measurements many of them add can depend on what the test sequence defined, how the device responded and database lookups on the returned data. Even the seemingly easy to predict DAQ measurements can be passed through multiple software filters producing new signals. All of this makes it impossible to predict accurately ahead of time.
I can't use for-loops instead because not all the data acquired can or will be analysed especially when it isn't anticipated (5 different serial/diagnostic protocols).

I didn't want to include all that complexity in the example I posted because it would have taken me too long to code and it would have been harder for anyone to see quickly what was going on and offer suggestions.


From all your feedback it looks like I may not benefit significantly from moving to OOP. I'm not sure the added risk (me being unfamiliar with LVOOP) offsets the benefits gained at this stage.

For long tests where I can anticipate large amounts of data I think I will store data to a local TDMS file and only recall what I need when I need it.
For short tests I will keep everything the same to keep the analysis as fast as possible.
This application has been in use for many years with dozens of projects, users and test benches. Backward compatibility is very important and these changes need to be mostly transparent to users.

Now, how do I mark everyone's reply as the solution?
Actually ,now that I think about it, you have all helped me with the solution to my problem, not necessarily the answer to my question. It takes a lot of experience in problem solving to do that! Even when the inquirer doesn't ask an accurate question!
_____________________________

May I ask a follow-up question, this time more clearly DVR related?
Lets say I have measurements stored in a FGV. Then in the FGV, inside an In-Place Element Structure I index out an element and create a DVR pointing to it.
I would have guessed that the DVR would point to the element INSIDE the array because I created it inside the IPES. However if you look at the buffer allocations [LV2011SP1] it looks like it is allocating memory for a copy of the element. Would the DVR point to the copy or the array element? If the copy, what happens when I try to create multiple concurrent references?
It appears to behave different for objects. I would like to think that I can create multiple concurrent DVRs to different objects contained in the same array.

post-11537-0-87061600-1379292516.png

Edited by Troy K
Posted

I believe in the upper of your two pictures there's a buffer allocation on the wire junction.  LabVIEW sees that you're doing something with the second wire and creates a copy of the array element.  The resulting DVR is NOT pointing to the actual element in your array if my assumptions are correct but is rather pointing to a separate copy of that array element in memory.  In order to do what I think you're trying to do you would need to create an array of DVRs and make a copy of the existing DVR from the array to have the DVR pointing tot he array element.

 

Does that line up with your experiences of the code behaviour?

 

I believe the only way to create concurrent DVRs to a single memory location is to branch the DVR wire.  You can't call the "Data to DVR" several times because this (AFAIK) creates several DVR pointing each to different memory locations.

  • Like 1
Posted
This is done in a separate low priority thread so the next test can be started without having to wait. 

You can’t have your cake and eat it too.  If you want a low priority process to wait till later then you have to retain the data it needs.  If you want no copies, then everybody else has to wait for that low priority process to finish with the data before overwriting it.  

 

 

In the example the measurement array is stored in a functional global to try and reduce copies instead of passing wires through the application.

“Passing wires” doesn’t make copies.  Even branching wires doesn’t always make copies.

 

It is built one measurement at a time to indicate how 15 different modules can all add items to the measurement array depending on what happened during the test.

Your app says “queues” to me, not “FGV” or “DVR".  Queue all the measurements to a central analyzer.  Queues are asynchronous, so can’t stall your acquisition loops.   Access to a FGV or DVR is synchronous, so to avoid blocking acquisition you are forced to make a copy for lengthy analysis. 

 

From all your feedback it looks like I may not benefit significantly from moving to OOP. I'm not sure the added risk (me being unfamiliar with LVOOP) offsets the benefits gained at this stage.

Your description of the wide variety of measurements that can be produced definitely suggests LVOOP.  A low-risk path would be to just start using classes in place of whatever type-def clusters you are using for measurements.  That will force you to use the encapsulation that will allow you to extend using inheritance later on.  

 

Would the DVR point to the copy or the array element? 

To a copy.  DVRs aren’t pointers; they have locking features that I don’t think could work for locks inside locks.  Can you just use an array of DVRs to implement whatever it is your after?

Posted (edited)
If you want a low priority process to wait till later then you have to retain the data it needs.

Yes you are correct.

 

 

If you want no copies, then everybody else has to wait for that low priority process to finish with the data before overwriting it.

I agree. I realise the low priority thread requires a copy. However you'll note that in my original post I stated my second objective as "reduce data copies", not eliminate all copies.

 

 

“Passing wires” doesn’t make copies.  Even branching wires doesn’t always make copies.

Again, I agree with you. I was hoping readers would recognise that by saying "passing wires through the application" I was referring to a traditional LabVIEW programming methodology that had an increased likelihood of creating data copies when branching wires. The LabVIEW compiler is a very complicated beast and very few people would attempt to predict exactly when a data copy will or won't be made.

 

Your app says “queues” to me, not “FGV” or “DVR".  Queue all the measurements to a central analyzer.

Some signals require data from others to properly analyse them. If the "ON" state has been defined as being relative to the battery voltage, then the analysis module reads the corresponding section of the battery voltage signal to compare it to the "ON" state. Some signals are "linked" to others such that they will appear on the same graph in the report. All of this means that I need access to more than one signal at a time. Again, I didn't want to delve too deeply into the complexity of my application.

 

 

Can you just use an array of DVRs to implement whatever it is your after?

Yes I'm starting to think that would work. Something stated in the white paper I referred to in my original post made me assume that I had to keep the object itself somewhere, such as an FGV, otherwise LabVIEW would remove it from memory. "A piece of data exists as long as it is needed and is gone when it is not needed any more. ... Copies on the wires exist until the next execution of that wire."

However it appears (from what I understand) that I should be storing ONLY an array of DVRs in my FGV because LabVIEW retains the actual objects somewhere else when I create the DVR.

 

 

I believe in the upper of your two pictures there's a buffer allocation on the wire junction. 

Oh yes, you're right. Trickly little dot escaped my notice. Maybe I should have used different wire colors. :oops:

In that test VI I was trying to see if the compiler would recognise that I wanted a DVR to the array element and not make a copy. But I didn't understand what was happening with the DVR.

My concern was that as soon as I created a DVR to another object in the array (using the same method VI) that my first object DVR would become invalid because LabVIEW released that copy of the object from memory.

After reading the description for the "New Data Value Reference Function" in the help again, it seems that a DVR may keep the object in memory and it only "appears" again when you call the "Delete Data Value Reference Function".  If this is the case then there is no need to keep the objects themselves in a FGV, only the DVRs as has been suggested.

Edited by Troy K
Posted
I was hoping readers would recognise that by saying "passing wires through the application" I was referring to a traditional LabVIEW programming methodology that had an increased likelihood of creating data copies when branching wires. The LabVIEW compiler is a very complicated beast and very few people would attempt to predict exactly when a data copy will or won't be made.

Yes, by-reference data allows one to be sure your not making copies.  It can take a lot of experience before one starts to trust by-value data handling.  Using in-place elements and always wiring straight through functions can give one confidence, even if those techniques are often unnecessary.  But, a good by-value design can be at least as “copy-efficient”, if not more so, as a by-ref design even, while also having other advantages.

 

Some signals require data from others to properly analyse them. If the "ON" state has been defined as being relative to the battery voltage, then the analysis module reads the corresponding section of the battery voltage signal to compare it to the "ON" state. Some signals are "linked" to others such that they will appear on the same graph in the report. All of this means that I need access to more than one signal at a time. Again, I didn't want to delve too deeply into the complexity of my application.

Still says “queue”.  Queue the data to the central analyzer and have it retain whatever info it needs to interpret further data (like the last battery voltage).  

 

— James

Posted
Something stated in the white paper I referred to in my original post made me assume that I had to keep the object itself somewhere, such as an FGV, otherwise LabVIEW would remove it from memory. "A piece of data exists as long as it is needed and is gone when it is not needed any more. ... Copies on the wires exist until the next execution of that wire."

However it appears (from what I understand) that I should be storing ONLY an array of DVRs in my FGV because LabVIEW retains the actual objects somewhere else when I create the DVR.

Oh dear, I think that paper must be badly written, at least when interpreted from your background (and the “copies on the wires” part isn’t strictly true).  The whole point of by-value dataflow is NOT worrying about object lifetime.  

Posted
As soon as the VI that created the original object goes out of memory your DVR will be stale. I do not think it matters if you are storing the DVR somewhere else (i.e. in an array with other DVRs).

 

Are you sure? I tried to confirm your statement but it seems to work how I expected it to. Here is my test VI (I'm assuming an object in a DVR will behave the same).

post-11537-0-71012100-1379471108_thumb.p

I profiled the memory usage of the VI and it showed just over 8MB, equivalent to 1 of the waveforms. This is what I anticipated. The profile performance tool can't capture the data stored in the DVR, only the reference itself. So even though I generate a total of 200MB of data, it only ever peaks at 8MB at a time because only the data from 1 element appears on a wire at any given time.

Even though I didn't think it would make any difference, I wrapped different sections into subVIs [see attached] to try it just in case. It still works.

I know technically the VIs aren't removed from memory, but running the same VI more than once (in the for loop) should overwrite any data stored inside.

 

I suppose the next step would be to dynamically call the create VI, store only the DVRs in an FGV, close the dynamic VI, then see if I can still access the data from any of the DVRs in the FGV.

 

Does it work differently for objects?

 

DVR store data.zip

 

Abe Simpson - "I used to be with it, but then they changed what it was. Now what I'm with isn't it, and what's it seems weird and scary to me, and it'll happen to you, too."

Posted (edited)

OK, now I've tried 4 different ways:

1 Test DVR data retention All in1.vi (As picture above) - works OK.

2 Test DVR data retention subvis.vi (As above but using subvis) - works OK.

3 Test DVR data retention fgv not dynamic.vi (DVRs stored in FGV) - works OK.

4 Test DVR data retention dynamic.vi (Data generated and added to FGV in dynamically called subVI) - FAILS (as neil said it would).

My tests were in LabVIEW 2011SP1 & 2012

Test DVR data retention.zip

I'm guessing the LabVIEW compiler decides that the DVRs created inside the Dynamic VI can be released.

So as long as I don't try to store the DVR in an FGV called inside a dynamic VI, I should be alright in the versions of LabVIEW that I've tested. How stable would this be in future versions of LabVIEW though? Is there some specification for DVRs that guarantees this will continue to work?

Edited by Troy K
Posted
It's all an issue of the lifetime of the original object. Forget there is even a DVR involved, this should not add anything to the mix. I would not expect this to change in future versions of LabVIEW as it would break too much code that relies on it (as this is not really a trick, it is an expected behaviour).

 

If you can guarantee the original object is still in memory then the DVR will still be valid. This is why I proposed trying to move the creation of this object (and thus the corresponding DVR) as high in the hierarchy as possible, somewhere that is always going to be in memory (so the top-level or one of its sub-VIs.). As soon as you have dynamically launched VIs this really does complicate things as the lifetime is now not well defined.

 

I could be wrong, but I do not think that there is a two way relationship between having an open DVR and the object itself. Although the DVR depends on the original object still being in memory, the converse is not true: keeping the DVR still alive (i.e. in memory) will not guarantee the original object is in memory, so it does not matter how you are storing your DVRs (on a wire, in a FGV etc).

Hi Neil, I don’t think that’s how it works.  If you put a by-value object in a DVR, the data can’t disappear unless the DVR disappears.  The DVR “owns” the object.  Only if the DVR contains just a reference to something (another DVR, a queue, etc.) do you have to worry about its independent lifetime. 

 

The DVR, like other references in LabVIEW, is itself “owned” by the top-level VI of the subVI that created it (I think this might be called a “VI hierarchy”), and will become invalid if that VI hierarchy goes idle.  This only matters if you are sharing the reference between multiple VI hierarchies (such as by asynchronously-called VIs).

Posted
I'm guessing the LabVIEW compiler decides that the DVRs created inside the Dynamic VI can be released.

So as long as I don't try to store the DVR in an FGV called inside a dynamic VI, I should be alright in the versions of LabVIEW that I've tested. How stable would this be in future versions of LabVIEW though? Is there some specification for DVRs that guarantees this will continue to work?

It’s not a matter of “can be released”; all references owned by a top-level VI are required to be released as part of the process of going idle.  This is a little-known but standard way LabVIEW treats all references.  Changing this would break or introduce memory leaks into all sorts of old code.  I use asynchronously launched VIs extensively and I actually find this feature very useful, though also quite complicating.

  • 5 weeks later...
Posted

Hi,

 

Just a simple suggestion: Try G# Framework. It gives you tools that help your life in object oriented programming, and also lets you create data-based or reference-based classes. It is a well thought out toolkit already offering what you need.

You may just try it out. Personally I have used it's reference based classes and worked always well.

Install it, create a reference based class(G#Template), create some property access VIs, try the different templates for creating methods... and dive into the VIs. You will see and understand how it works. (It uses DVRs to create reference based classes.)

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Unfortunately, your content contains terms that we do not allow. Please edit your content to remove the highlighted words below.
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use.