Jump to content


Photo
- - - - -

Large Data Sets: how to efficiently retrieve an element from "somewhere" in a queue


  • Please log in to reply
41 replies to this topic

#1 neil

neil

    Extremely Active

  • Members
  • PipPipPipPip
  • 435 posts
  • Location:Surrey, UK
  • Version:LabVIEW 2012
  • Since:2004

Posted 21 June 2012 - 10:20 AM

Hi All,

I am writing a class the implements a large-ish data store of 2D image data. I currently have code that uses a 3D array, and it works fine but for obvious reasons the OS struggles to allocate memory when the number of images gets large.

For this reason (and other reasons) I am re-writing this data store as a class with a DVR to a queue. Forgetting about the DVR side of it for now, can anybody recommend an efficient way to retrieve a random (i.e. user specified at runtime) element from the queue without actually affecting it.

I have seen various implementations of this, including popping the elements out until the desired one is reached, wiring it out and then pushing the elements back onto the queue. This does not seem like an efficient way of doing things. The only other way I can think of is to use the queue status to return a copy of the elements and then indexing out the desired element. As my queue my be quite large this also does not seem like a nice way of doing things as a double memory allocation is needed? Perhaps for this reason the queue pop-read-push method is better?

For information, I am working with 2D images of 256x256 pixels, at up to 32 bpp (so thats approx 200 kB per image). There could potentially be thousands of images buffered, and I have to do this for several different image sources.

EDIT: the data on my queue is not actually purely a 2D array, it is a class representing an image so it has the 2D data as well as some header information which is a string.

EDIT 2: I am going to benchmark both versions

Edited by neil, 21 June 2012 - 10:22 AM.

CLA, CPI and CTM (Certified Tea Maker)

#2 neil

neil

    Extremely Active

  • Members
  • PipPipPipPip
  • 435 posts
  • Location:Surrey, UK
  • Version:LabVIEW 2012
  • Since:2004

Posted 21 June 2012 - 10:39 AM

Update: the dequeue-read-queue method seems to be MUCH faster when dealing with "large-ish" data sets.

If my data set contains 1000 images, then it seems to take the queue method approx 10 ms to "read" the 999th element (worst case scenario) versus approx 60 ms for the Status method. These timings were obtained by doing the read 100 times and averaging the result.
CLA, CPI and CTM (Certified Tea Maker)

#3 neil

neil

    Extremely Active

  • Members
  • PipPipPipPip
  • 435 posts
  • Location:Surrey, UK
  • Version:LabVIEW 2012
  • Since:2004

Posted 21 June 2012 - 11:33 AM

Update 2: my queue method does not work properly, it only reads and then requeues the first image.

Posted Image
CLA, CPI and CTM (Certified Tea Maker)

#4 Tim_S

Tim_S

    The 500 club

  • Members
  • PipPipPipPipPip
  • 522 posts
  • Location:Michigan
  • Version:LabVIEW 2011
  • Since:1994

Posted 21 June 2012 - 12:05 PM

A queue would not have been the first thing I would have thought of for implementing random data selection (especially with a dvr). My first thoughts would have been an array or a linked list. Why use a queue?
Tim

"If this was easy our kids would be doing it." - Coworker

#5 neil

neil

    Extremely Active

  • Members
  • PipPipPipPip
  • 435 posts
  • Location:Surrey, UK
  • Version:LabVIEW 2012
  • Since:2004

Posted 21 June 2012 - 12:23 PM

A queue would not have been the first thing I would have thought of for implementing random data selection (especially with a dvr). My first thoughts would have been an array or a linked list. Why use a queue?


I originally used an array, but as the data set gets large the OS struggles to allocate a contiguous memory space for the (3D) array.

As LabVIEW does not have a native linked list implementation I used a queue which internally is similar perhaps, just missing the ability to iterate.
CLA, CPI and CTM (Certified Tea Maker)

#6 Mads

Mads

    Very Active

  • Members
  • PipPipPip
  • 171 posts
  • Location:Bergen, Norway
  • Version:LabVIEW 2012
  • Since:1997

Posted 21 June 2012 - 12:45 PM

- Working with a preallocated buffer (and doing additional allocations in chunks if needed) should speed up the 3D array solution. Keep a separate list for additional info about the elements in the 3D buffer.

- Instead of a 3D array you could also have separate 2D arrays; for example by instatiating a VI for each picture that is named based on the picture. You can then read back the picture just by calling/enquiring the associated VI.

- OR,as I did for fun just now; store the pictures as attributes to a variant...(although not actually putting the pictures in as variants but using it as a lookup for the index is more sensible):

With 1000 pictures at size 256x256 stored this way one thousand random readouts only take 57 milliseconds...The averagefor a single read is in other words just 57 microseconds.

Edited by Mads, 21 June 2012 - 12:51 PM.

MTO

#7 neil

neil

    Extremely Active

  • Members
  • PipPipPipPip
  • 435 posts
  • Location:Surrey, UK
  • Version:LabVIEW 2012
  • Since:2004

Posted 21 June 2012 - 01:02 PM

- Working with a preallocated buffer (and doing additional allocations in chunks if needed) should speed up the 3D array solution. Keep a separate list for additional info about the elements in the 3D buffer.

- Instead of a 3D array you could also have separate 2D arrays; for example by instatiating a VI for each picture that is named based on the picture. You can then read back the picture just by calling/enquiring the associated VI.

- OR,as I did for fun just now; store the pictures as attributes to a variant...(although not actually putting the pictures in as variants but using it as a lookup for the index is more sensible):

With 1000 pictures at size 256x256 stored this way one thousand random readouts only take 57 milliseconds...The averagefor a single read is in other words just 57 microseconds.


Mads, these are some good suggestions, but have a slightly "dirty" feel about them.

I have not used the attribute properties of a variant before, going to take a look at this now.
CLA, CPI and CTM (Certified Tea Maker)

#8 Mads

Mads

    Very Active

  • Members
  • PipPipPip
  • 171 posts
  • Location:Bergen, Norway
  • Version:LabVIEW 2012
  • Since:1997

Posted 21 June 2012 - 01:42 PM

The queue is just as filthy! ;) But seriously, working with a preallocated array is as clean as you can get I think..

The other two do seem a bit dirty yes, but I think they have their merits as long as you do not push them too far. The third one also hints at what might be both clean AND effective :yes:
MTO

#9 mje

mje

    The 500 club

  • Premium Member
  • 823 posts
  • Location:Milford MA USA
  • Version:LabVIEW 2011
  • Since:1997

Posted 21 June 2012 - 02:28 PM

I'll ask you this, are you sure you'll only ever have to deal with 1000 elements? A 200 MB array is not trivial, but can be managed in most cases. Much beyond that though, and you ought to start thinking about that little ticking sound you hear whenever you run your code...

As already implied, queues should not be your first choice if you need random access. I have a similar requirement where I have to store spectrometric data for potentially large set sizes and require random access. Each data element is at best 100 kB, but there can be a lot of them. Arrays do not work so well because they demand continuous memory spaces.

Initially I used variant attributes. This worked very well, they have fast look-up and access times, and I only ever needed a handful of data elements at a time, so the need to copy them out of the variant for each access was not prohibitive. If you also require sequential access it's not hard to keep track of a secondary array that stores a sorted list of names corresponding to each data element since the names should be far smaller than the data elements themselves. I really like this solution and continue to use it in many applications.

However we've had to ditch variants for the application I've mentioned previously. Despite not requiring continuous memory, the total memory space of our application still became too much to handle for customers pushing the limits of our technology-- when reaching into the 1-2 GB regime things start to get twitchy unless you're working with 64-bit memory space.

The reality is we don't need access to all that info at once, even though we need all the data at some point. So now we're working with a database back end where most of the data is just cached to disk. See the various SQLite discussions. It's a very solid library, and there are multiple options available to deal with it in LabVIEW if you don't want to cope with the native binaries.

Other options are to create some sort of managing library that handles collections of smaller sized arrays, such that each array for example doesn't exceed a threshold size. But that seems like a lot of work given that there are other options available.

#10 Tim_S

Tim_S

    The 500 club

  • Members
  • PipPipPipPipPip
  • 522 posts
  • Location:Michigan
  • Version:LabVIEW 2011
  • Since:1994

Posted 21 June 2012 - 02:40 PM


I originally used an array, but as the data set gets large the OS struggles to allocate a contiguous memory space for the (3D) array.

As LabVIEW does not have a native linked list implementation I used a queue which internally is similar perhaps, just missing the ability to iterate.

There is a linked list implementation at: https://decibel.ni.c.../docs/DOC-12668. It's some work to implement and, thinking about it, probably not better than the queue.

You could create a hash table. This would significantly speed up access when dealing with large sets of data. Variant attributes work pretty well for the lookup for the hash.
Tim

"If this was easy our kids would be doing it." - Coworker

#11 Aristos Queue

Aristos Queue

    LV R&D: I write C++/# so you don't have to.

  • Premium Member
  • 2,641 posts
  • Location:Austin, TX
  • Version:LabVIEW 2011
  • Since:2000

Posted 21 June 2012 - 02:55 PM

I originally used an array, but as the data set gets large the OS struggles to allocate a contiguous memory space for the (3D) array.

Um... but the queue has to maintain that same contiguous memory space. If the queue solution is working then the array solution will work. I want to see your array solution so we can kibitz on that!

#12 neil

neil

    Extremely Active

  • Members
  • PipPipPipPip
  • 435 posts
  • Location:Surrey, UK
  • Version:LabVIEW 2012
  • Since:2004

Posted 21 June 2012 - 03:15 PM

Um... but the queue has to maintain that same contiguous memory space. If the queue solution is working then the array solution will work. I want to see your array solution so we can kibitz on that!


Queues are contiguous? Wow that is not something I expected...

Some further info about my app. The things generating the images are throwing back data quite rapidly. I am acquiring data at approx 100 Hz from an image sensor chip, so that is approx 26 MB/s sustained transfer. I need to buffer up this data for the engineers to look at while the tests are running, I need to write it to disk and I need to spool it to a remote client over TCP/IP all basically in real-time. Some local buffering does occur on the FlexRIO board as we can go up to 1000 images/second for short bursts until the FlexRIO RAM fills up.

Now the next spec calls for four of these chips in parallel, so thats sustained data rate of approx 100 MB/s which is quite a lot to manage.

I potentially need to buffer 1000 images from each of these detectors, so I need approx a gigabyte of memory space. The target system is a PXI chassis with (I think) a Celeron with 2 GB or RAM, so there is not much free!
CLA, CPI and CTM (Certified Tea Maker)

#13 Aristos Queue

Aristos Queue

    LV R&D: I write C++/# so you don't have to.

  • Premium Member
  • 2,641 posts
  • Location:Austin, TX
  • Version:LabVIEW 2011
  • Since:2000

Posted 21 June 2012 - 03:25 PM

Going back to your original post, I think you want this:

Create a DVR that contains an array of cluster of 2D arrays.
When you want the Nth array, use one Inplace Element Structure to unlock the DVR, and then use a second inner Inplace Element Structure to read out a value of the array, then use a third IPE to unbundle the 2D array, and then
--- ... if you just need to modify the value, modify it right there and then stick it back into the right side of the IPE.
--- ... if you need to copy the value out of the array, then just wire the array out.
--- ... if you need to take the array out and don't mind leaving the previous entry as an empty array, you can avoid the hit of copying the array using the Swap Values primitive. Wire the 2D Array to the top input of Swap Values. Wire an empty 2D array constant to the bottom input. Wire the top output back into the IPE. Wire the bottom output out to whereever you need the 2D array. You can use the same trick for copying into the data structure.

Queues are contiguous? Wow that is not something I expected...

How do you think they achieve speed? Can't be doing heap allocations all the time. :-) It's a circular buffer under the hood that reallocates larger blocks as needed.

#14 neil

neil

    Extremely Active

  • Members
  • PipPipPipPip
  • 435 posts
  • Location:Surrey, UK
  • Version:LabVIEW 2012
  • Since:2004

Posted 21 June 2012 - 03:39 PM

AQ, all I am after is the most efficient implementation of a large buffer with random read access, write access is always sequential. Does your suggestion of using the array of cluster of 2D array get around the contiguous problem?
CLA, CPI and CTM (Certified Tea Maker)

#15 ShaunR

ShaunR

    LabVIEW Archetype

  • Members
  • PipPipPipPipPipPip
  • 2,274 posts
  • Version:LabVIEW 2009
  • Since:1994

Posted 21 June 2012 - 03:50 PM

*
POPULAR

One way to get around LabVIEWs contiguous memory allocation is to use an array of DVRs such that LabVIEW rather than allocating, say, 200MB it instead allocates 1000x200k (i.e single 200K allocations)
If you add each image as a DVR to a single 2D element to an array, then Labview is able to find contiguous memory for the small chunk, rather than for the much larger "total array". You then use the DVR array to lookup and extract the data.

However. AS Mje intimates. A DB is really the only scalable solution.

Edited by ShaunR, 21 June 2012 - 03:54 PM.

A positive attitude may not solve all your problems, but it will annoy enough people to make it worth the effort. (Herm Albright 1876-1944).

Founder and general mischief maker on www.labview-tools.com.
SQlite aficionado and websocket zealot.
If it 'aint in LabVIEW, then you 'aint got a clue!

#16 neil

neil

    Extremely Active

  • Members
  • PipPipPipPip
  • 435 posts
  • Location:Surrey, UK
  • Version:LabVIEW 2012
  • Since:2004

Posted 21 June 2012 - 04:02 PM

One way to get around LabVIEWs contiguous memory allocation is to use an array of DVRs such that LabVIEW rather than allocating, say, 200MB it instead allocates 1000x200k (i.e single 200K allocations)
If you add each image as a DVR to a single 2D element to an array, then Labview is able to find contiguous memory for the small chunk, rather than for the much larger "total array". You then use the DVR array to lookup and extract the data.

However. AS Mje intimates. A DB is really the only scalable solution.



The DVR is there as this is a ByRef implementation of my class. If I am going to have multiple DVRs I may as well implement a linked list :D (which I would love to do from an academic point of view, but time and cost does not permit).
CLA, CPI and CTM (Certified Tea Maker)

#17 ShaunR

ShaunR

    LabVIEW Archetype

  • Members
  • PipPipPipPipPipPip
  • 2,274 posts
  • Version:LabVIEW 2009
  • Since:1994

Posted 21 June 2012 - 04:13 PM



The DVR is there as this is a ByRef implementation of my class. If I am going to have multiple DVRs I may as well implement a linked list :D (which I would love to do from an academic point of view, but time and cost does not permit).

No. Each element is a DVR. There's no point in implementing a linked list. A simple array lookup will suffice.
A positive attitude may not solve all your problems, but it will annoy enough people to make it worth the effort. (Herm Albright 1876-1944).

Founder and general mischief maker on www.labview-tools.com.
SQlite aficionado and websocket zealot.
If it 'aint in LabVIEW, then you 'aint got a clue!

#18 Tim_S

Tim_S

    The 500 club

  • Members
  • PipPipPipPipPip
  • 522 posts
  • Location:Michigan
  • Version:LabVIEW 2011
  • Since:1994

Posted 21 June 2012 - 04:27 PM

No. Each element is a DVR. There's no point in implementing a linked list. A simple array lookup will suffice.

Interesting idea; I'll have to remember that one. :thumbup1:
Tim

"If this was easy our kids would be doing it." - Coworker

#19 neil

neil

    Extremely Active

  • Members
  • PipPipPipPip
  • 435 posts
  • Location:Surrey, UK
  • Version:LabVIEW 2012
  • Since:2004

Posted 21 June 2012 - 04:36 PM

No. Each element is a DVR. There's no point in implementing a linked list. A simple array lookup will suffice.


Interesting...
But won't the DVR become invalid as soon as the VI leaves memory? So if I have "Add Image.vi" that internally consists of a class constant wired into a DVR (which is what I think you are referring to?) will the DVR still be valid outside this sub-VI?

I am not too familiar with lifetime type issues of DVRs.
CLA, CPI and CTM (Certified Tea Maker)

#20 ShaunR

ShaunR

    LabVIEW Archetype

  • Members
  • PipPipPipPipPipPip
  • 2,274 posts
  • Version:LabVIEW 2009
  • Since:1994

Posted 21 June 2012 - 04:55 PM


Interesting...
But won't the DVR become invalid as soon as the VI leaves memory? So if I have "Add Image.vi" that internally consists of a class constant wired into a DVR (which is what I think you are referring to?) will the DVR still be valid outside this sub-VI?

I am not too familiar with lifetime type issues of DVRs.

Nope. Your class constant will be an array (although this too could be a DVR, but lets keep it simple). When you "Add" you simply wire the data to the Create DVR and the resulting DVR ref gets added to the array class constant.

Here's a quick and dirty (non-class) example, In your case your class constant would be the equivalent of the shift register.

[attachment=6817:DVR Array.vi]
A positive attitude may not solve all your problems, but it will annoy enough people to make it worth the effort. (Herm Albright 1876-1944).

Founder and general mischief maker on www.labview-tools.com.
SQlite aficionado and websocket zealot.
If it 'aint in LabVIEW, then you 'aint got a clue!