Steen Schmidt Posted November 19, 2011 Report Posted November 19, 2011 (edited) Hi. In an application of mine I came into an issue where a queue ran full, so I want to start a little discussion about which data structures scales the best when the goal is to use RAM to store temporary data (a "buffer"). This is for Desktop and Real-Time (PXI, cRIO etc.). My particular application is a producer-consumer type design, where the consumer can be delayed for a long time but will always catch up given enough time - the resource to support this is of course enough RAM, which in this case is a valid requirement for the system. The issue will present itself in many forms of course, so this might be of interest to a lot of you guys here. My application unexpectedly gave me a memory allocation error (code 2 on error out from the Enqueue prim, followed by a pop-up dialog stating the same) after filling about 150 Mb of data in the queue (that took a couple of seconds). This system was a 32-bit Windows XP running LabVIEW 2010 SP1. Other systems could be 64-bit, LV 2009-2011, or it could be Real-Time, both VxWorks and ETS-Pharlap. Is 150 Mb really the limit of a single queue, or is it due to some other system limit - free contiguous memory for instance? The same applies for arrays, where contiguous memory plays an important role for you running out of memory before your 2 or 3 Gb is used up. And, at least as important; how do I know beforehand when my queue is about to throw this error? An error from error out on the Enqueue prim I can handle gracefully, but the OS pop-up drops a wrench in the gears - it's modal, it can stop most other threads, and it has to be acknowledged by the user. On Real-Time it's even worse. Cheers, Steen Edited November 19, 2011 by Steen Schmidt Quote
asbo Posted November 19, 2011 Report Posted November 19, 2011 I'd be surprised if queues needed contiguous memory (I'm picturing a linked list implementation), but there's certainly no upper memory limit that I know of per-queue. What's more important, of course, is how much memory the rest of your app was using. How much memory was free on the PC at the time you got the error? The scenario you're describing was on the Win32 system, right? Quote
Steen Schmidt Posted November 20, 2011 Author Report Posted November 20, 2011 (edited) Arrays need contiguous memory even when they don't have to... Anyways, this snippet of code runs out memory before RAM is full: My first encounter was on a WinXP 32-bit machine with ~1 Gb RAM free, LabVIEW reported memory full after ~150 Mb allocation for the queue. Then I ran the above snippet on a Win7 64-bit machine running LV 2010 SP1 32-bit with about 13 Gb RAM free (I know LV 32-bit should be able to use up to about 4 Gb RAM). LV now reported mem full after LV had allocated about 3.8 Gb RAM (don't know how much went to the queue). This memory allocation remained at that level, so following runs of the same code snippet reported memeory full much sooner, almost immediately in fact. LV only deallocated the 3.8 Gb RAM when it was closed and opened again, in which case I was able to fill the queue for a long time again before mem full. Adding a request mem deallocate in the VI didn't help me get the allocated memory back from when the queue ran full. - So at least on the 64-bit machine LV could use about all the memory as to be expected, but it didn't deallocate that memory again when the VI stopped. - Any idea how to avoid the pop-up dialog stating the mem full? I'd really like a graceful handling of this failure scenario. Cheers, Steen Edited November 20, 2011 by Steen Schmidt Quote
asbo Posted November 20, 2011 Report Posted November 20, 2011 Oh, you didn't specify your queue data type was an array; that would implicitly require chunks of contiguous memory, yes. There are Windows API calls you can use to evaluate the memory of the system, but you'd have to do some trial and error to see if you can find which counters are going to help you predict this scenario. I don't know of any way to predict out-of-memory conditions with reliable accuracy. There's always some calls to resize the working set of LabVIEW, which I've found works more reliably to reduce memory footprint than the request-deallocation method (but there may be caveats to doing this since LV has its own memory management). There's a thread about it on the NI forums. Quote
Steen Schmidt Posted November 20, 2011 Author Report Posted November 20, 2011 This happens with all data types, not just arrays. I just used arrays in my example code snippet. My original problem where I could only allocate 150 Mb before mem error happened with a 48 byte constant cluster. I could reproduce it with a queue data type of Boolean, although it took much longer time to enqueue 150 Mb with Booleans :-) /Steen Quote
Popular Post GregR Posted November 21, 2011 Popular Post Report Posted November 21, 2011 Pardon the book, but let me try to clarify some concepts here. The question of how much memory was free on the machine running the test is irrelevant. All desktop operating systems use virtual memory so each process can allocate up to its address space limit regardless of the amount of physical RAM in the machine. The amount of physical RAM only affects the speed at which the processes can allocate that memory. If RAM is available, then allocation happens fast. If RAM is not available, then some part of the RAM content must be written to disk so that the RAM can be used for the new allocation. Since the disk is much slower than RAM, that makes the allocation take longer. The key is this only affect speed not how much allocation is required to hit the out of memory error. Just because the task manager still says LabVIEW is using a bunch of memory doesn't mean that LabVIEW didn't free your data when your VI stopped running. LabVIEW uses a suballocator for a lot of its memory. This means we allocate large blocks from the operating system, then hand those out in our code as smaller blocks. The tracking of those smaller blocks is not visible to the operating system. Even if we know that all those small blocks are free and available for reuse, the operating system still reports a number based on the large allocations. This is why even though the task manager memory usage is high after the first run of the VI, the second run can still run about the same number of iterations without the task manager memory usage changing much. Since the amount of memory LabVIEW can allocate is based on its address space (not physical memory), why can't it always allocate up to the 4GB address space of a 32-bit pointer? This is because Windows puts further limitations on the address space. Normally Windows keeps the top half of the address space for itself. This is partially to increase compatibility because a lot of applications treat pointers as signed integers and the integer being negative causes problems. In addition to that the EXE and any DLLs loaded use space in the address space. For LabVIEW this typically means that about 1.7 GB is all the address space we can hope to use. If you have a special option turned on in Windows and the application has a flag set to say they can handle it, Windows allows processes access to 3GB of address space instead of only 2 so you can go a little higher. Running one of these applications on 64-bit Windows allows closer to the entire 4GB address space because Windows puts itself above that address. And then of course running 64-bit LabVIEW on a 64-bit OS gives way more address space. This is the scenario where physical RAM becomes a factor again because the address space is so much larger than physical RAM and performance becomes the limiting factor rather than actually running out of memory. The last concept I'll mention is fragmentation. This relates to the issue of contiguous memory. You may have a lot of free address space but if it is in a bunch of small pieces, then you are not going to be able to make any large allocations. The sample you showed is pretty much a worst case for fragmentation. As the queue gets more and more elements, we keep allocating larger and larger buffers. But between each of these allocations you are allocating a bunch of small arrays. This means that the address space used for the smaller queue buffers is mixed with the array allocations and there aren't contiguous regions large enough to allocate the larger buffers. Also keep in mind that each time this happens we have to allocate the larger buffer while still holding the last buffer so the data can be copied to the new allocation. This means that we run out of gaps in the address space large enough to hold the queue buffer well before we have actually allocated all the address space for LabVIEW. For your application what this really means is that if you really expect to be able to let the queue get this big and recover, you need to change something. If you think you should be able to have a 200 million element backlog and still recover, then you could allocate the queue to 200 million elements from the start. This avoids the dynamically growing allocations greatly reducing fragmentation and will almost certainly mean you can handle a bigger backlog. The downside is this sets a hard limit on your backlog and could have adverse affects on the amount of address space available to other parts or your program. You could switch to 64-bit LabVIEW on 64-bit Windows. This will pretty much eliminate the address space limits. However, this means that when you get really backed up you may start hitting virtual memory slowdowns so it is even harder to catch up. You can focus on reducing the reasons that cause you to create these large backlogs in the first place. Is it being caused by some synchronous operation that could be made asynchronous? 4 Quote
GregSands Posted November 21, 2011 Report Posted November 21, 2011 GregR -- I really appreciate your posts on what's beneath the surface of LabVIEW, and especially your comments above on 64-bit LabVIEW. Just to further the discussion on array sizes, I work a lot with large 3D arrays, and I posted recently on ni.com here about a crash I have with this code: which crashes immediately at Array Subset, without any error reporting. The allocation seems to work ok - in the code I extracted this from, I fill in the array from disk, and can take other subsets until I get to these indices here. This is with LV 2011 64-bit on Windows 7 64-bit, 4GB memory, another 4GB virtual memory, and the array here is just under 2.5GB in size. Using this tool, there are apparently 2 free blocks of at least 3GB, so allocation appears not to be the problem. Using the 64-bit Vision toolkit to look at memory allocation, I can allocate 2 2GB images, and another of almost that size: I presume the 2GB limit is a limitation of the Vision toolkit rather than the machine. However even though I can allocate those, if I try to use them, I may sometimes crash LabVIEW (or just now, even crash my browser while typing this post!). I guess I was hoping that moving to a 64-bit world would be a fairly painless panacea for memory issues, and eliminate (or at least significantly reduce) the need to partition my data in order to work with it. While I can't move completely anyway (until some other toolkits become fully 64-bit compatible) I'd wanted to try it out for a few of the larger problems. Perhaps more physical memory would help - that would suggest it might be a Windows issue with managing virtual memory - however looking at the first problem on another machine with 8GB, the 2.5GB array won't even allocate even though a much larger 1000x1000x5000 array will. Trying to think what would be helpful in terms of allocating and using memory: a way to check what memory can be allocated for a given array (the tool mentioned above gives some numbers, but they don't seem to relate to what can actually be used) if memory is allocated, then it should be able to be used without other problems later on I'm sure this is impossible, but I'd love the ability for Vision and LabVIEW to "share" memory (see here) behind the scenes management of fragmented arrays -- FALib gets closer, but is only easy for 1D arrays One last thought, on a more philosophical level - where does 64-bit LabVIEW fit in NI's thinking? At the moment, it's very much the poor cousin, barely supported or promoted, with only the Vision toolkit available at release (ASP takes another few months, still not there for LV 2011). Given LabVIEW's predominant use in the scientific and engineering community, and the rapidly increasing availability of 64-bit OS, how long until 64-bit becomes the main LabVIEW release? Quote
Steen Schmidt Posted November 22, 2011 Author Report Posted November 22, 2011 The last concept I'll mention is fragmentation. This relates to the issue of contiguous memory. You may have a lot of free address space but if it is in a bunch of small pieces, then you are not going to be able to make any large allocations. The sample you showed is pretty much a worst case for fragmentation. As the queue gets more and more elements, we keep allocating larger and larger buffers. But between each of these allocations you are allocating a bunch of small arrays. This means that the address space used for the smaller queue buffers is mixed with the array allocations and there aren't contiguous regions large enough to allocate the larger buffers. Also keep in mind that each time this happens we have to allocate the larger buffer while still holding the last buffer so the data can be copied to the new allocation. This means that we run out of gaps in the address space large enough to hold the queue buffer well before we have actually allocated all the address space for LabVIEW. Thanks, Greg, for your thorough explanation. I suspected contiguous memory was the issue here, but while I know that LV arrays and clusters need contiguous memory I didn't believe a queue needed contiguous memory? That's a serious drawback I think, putting an even bigger hit on the dynamic allocation nature of queue buffers. As touched on earlier in this thread I thought a queue was basically an array of pointers to the elements, with only this (maybe 1-10 Mb) array having to fit in contiguous memory, not the entire possibly gigabyte sized buffer of elements. At least for complex data types the linked list approach would lessen the demands for contiguous memory. If the queue data type is simpler that overhead would obviously be a silly penalty to pay, in which case a contiguous element buffer would be smarter. But that's not how it is I gather. A queue is always a contiguous chunk of memory. It would be nice if a queue prim existed that could be used for resizing a finite queue buffer then... For your application what this really means is that if you really expect to be able to let the queue get this big and recover, you need to change something.If you think you should be able to have a 200 million element backlog and still recover, then you could allocate the queue to 200 million elements from the start. This avoids the dynamically growing allocations greatly reducing fragmentation and will almost certainly mean you can handle a bigger backlog. The downside is this sets a hard limit on your backlog and could have adverse affects on the amount of address space available to other parts or your program. You could switch to 64-bit LabVIEW on 64-bit Windows. This will pretty much eliminate the address space limits. However, this means that when you get really backed up you may start hitting virtual memory slowdowns so it is even harder to catch up. You can focus on reducing the reasons that cause you to create these large backlogs in the first place. Is it being caused by some synchronous operation that could be made asynchronous? Thanks. I know of these workarounds, and use them when necessary. For instance we usually prime queues on LV Real-Time to avoid the dynamic mem alloc at runtime. The case from my original post is solved, so no problem there - I was just surprised that I saw the queue cause a mem full so soon, but the need for contiguous memory for the entire queue is a surprising albeit fitting explanation. Do you have any idea how I could catch the mem full dialog from the OS? The app should probably end when such a dialog is presented, as a failed mem alloc could have caused all sorts of problems. I'd rather end it gracefully if possible though, instead of a "foreign" dialog popping up. Cheers, Steen Quote
GregR Posted November 22, 2011 Report Posted November 22, 2011 I suspected contiguous memory was the issue here, but while I know that LV arrays and clusters need contiguous memory I didn't believe a queue needed contiguous memory? That's a serious drawback I think, putting an even bigger hit on the dynamic allocation nature of queue buffers. As touched on earlier in this thread I thought a queue was basically an array of pointers to the elements, with only this (maybe 1-10 Mb) array having to fit in contiguous memory, not the entire possibly gigabyte sized buffer of elements. At least for complex data types the linked list approach would lessen the demands for contiguous memory. If the queue data type is simpler that overhead would obviously be a silly penalty to pay, in which case a contiguous element buffer would be smarter. But that's not how it is I gather. A queue is always a contiguous chunk of memory. When I say the queue buffer contains all the elements, that just means the top level of the data. For arrays that is just the handle. In your example I see it get close to 1.5 million elements. This means the queue buffer is only around 6MB. You actually seem to be off on the total memory calculation though. Each of your 128 uInt64 arrays is about 1K. That means that 1.5 million is 1.5GB. That puts you very close to the 1.7GB of usable address space and much higher than your estimated 150MB. I hadn't actually built the VI when I replied the first time so my focus on fragmentation was based on the low 150MB number. This appears to be more about actual usage than fragmentation. If you want to see what happens when the data really is flat inside the queue buffer, try putting an array to cluster after your initialize array and set the cluster size to 128. This produces the same amount of data as the array but it will be directly in the queue buffer. You will get a much smaller number of elements before stopping. Do you have any idea how I could catch the mem full dialog from the OS? The app should probably end when such a dialog is presented, as a failed mem alloc could have caused all sorts of problems. I'd rather end it gracefully if possible though, instead of a "foreign" dialog popping up. The out of memory dialog is displayed by LabVIEW not by the OS. The problem is this dialog is triggered at a low level inside our memory manager and at that point we don't know if the caller is going to correctly report the error or not. So we favor given redundant notifications over possibly giving no notification. This does get in the way of programmatically handling out of memory errors, but this is often quite difficult because anything you do in code might cause further allocation and we already know memory is limited. LV 2011 64-bit on Windows 7 64-bit, 4GB memory, another 4GB virtual memory I guess I forget that a lot of people limit their virtual memory size. This does affect my earlier comments about the amount of usable address space available to each process. This does put a limit on total allocations across all processes, so the amount available to any one process is hard to predict. One last thought, on a more philosophical level - where does 64-bit LabVIEW fit in NI's thinking? At the moment, it's very much the poor cousin, barely supported or promoted, with only the Vision toolkit available at release (ASP takes another few months, still not there for LV 2011). Given LabVIEW's predominant use in the scientific and engineering community, and the rapidly increasing availability of 64-bit OS, how long until 64-bit becomes the main LabVIEW release? Vision was the first to be supported on 64-bit because it was seen as the most memory constrained. Images are just big and it is easy to need more memory than 32-bit LV allows. Beyond that it is just a matter of getting it prioritized. Personally, I'd like to see parity or even 64-bit taking the lead. As sales and marketing continue to hear the request and we see users using 64-bit OSs, we should get there. 2 Quote
Steen Schmidt Posted November 22, 2011 Author Report Posted November 22, 2011 (edited) In your example I see it get close to 1.5 million elements. How do you see it get close to 1.5 million elements? On my example machine it allocated ~150 Mb memory before the mem full dialog (~150,000 elements). I've just tried running it again, and now I got to 1,658,935 elements (I think) before the mem full dialog, so I probably had much more fragmented memory earlier. The out of memory dialog is displayed by LabVIEW not by the OS. The problem is this dialog is triggered at a low level inside our memory manager and at that point we don't know if the caller is going to correctly report the error or not. So we favor given redundant notifications over possibly giving no notification. This does get in the way of programmatically handling out of memory errors, but this is often quite difficult because anything you do in code might cause further allocation and we already know memory is limited. Thanks for this info Greg, it makes perfect sense. A failed mem alloc is most probably a fish (or an app) dead in the water. Cheers, Steen Edited November 22, 2011 by Steen Schmidt Quote
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.