Jump to content

Variant Attribute Enumeration


mje

Recommended Posts

Posted

I use variant attributes quite a bit when I have a list of data which is both large (be it element size or number of elements) and dynamic (regularly adding/removing elements). This is ideal if access is primarily random, where I access some arbitrary elements at any given time.

From time to time though, I must use sequential access where I need to enumerate every element. There is no native way to index variant attributes, so as far as I can tell we're left with one of two ways to do the enumeration.

One:

post-11742-0-21093900-1301339333_thumb.p

This is a bad way of doing it, as it involves returning an array of all objects as variants, converting it to the proper type, then operating on each element. Memory consumption becomes dominated by the size of data contained in the attributes. When running the VI with 100 MB of data (1000 attributes each 100 kB in size), the VI uses just over 100 MB of memory.

Two:

post-11742-0-63740100-1301339524_thumb.p

This is a much better way of doing it, involving accessing each element individually. Reduces memory consumption since we're not operating on the large values array, the footprint is dominated by the size of the largest element. Running the same case as above shows memory consumption of just over 100 kB. Nice. Turns out it also runs faster than the previous code as well (about 2/3 the time).

But there's still a problem here. We're allocating an entire array of names. This can be verified if we redo the tests using large name strings as well. Using the same as above, 1000 attributes each with 100 kB values, but this time each name is also 100 kB in size, the memory footprints change to approximately 200 and 100 MB respectively. That is the second method still requires a lot of memory by virtue of the array of names it must use to operate.

While the above situation is largely academic (screw 128-bit GUIDs, I'm going with 819200-bit ones!), it can come into play if your lists are large by virtue of the number of elements (opposed to the size of elements). In this case method "One" might prove the better way of doing it.

I guess what I'm really going for is it would be good if there were a way to index variant attributes. YES, I know if you regularly need to enumerate your attributes you're doing it wrong, but there are some uses which you can't get away from. Serialization comes to mind. Any thoughts on this? I'll probably take this over to the idea exchange in a bit, just wanted to seed some discussion first.

Attached is some code I played with to get the numbers above. Includes the two snippet VIs above, and a third one which generates variants for them to operate on. WARNING: running the test with the "Long Names" method continuously while collecting metrics can make the IDE unstable due to memory spikes if the default values are used. Close out your "real" work first!

Regards,

-michael

AttributeEnumeration.zip

  • Like 2
Posted

I can see your point, however how likely is it that you don't know by forehand wheter you have large attributes (thus need the enumeration) or have large attribute-names (thus need indexing)?

If you develop your code you'll probably know which one of the two is most likely, or you could add an attribute called 'Style' that tells the code to either index or enumerate.

However you can index attributes, by placing the attribute inside a variant inside an attribute, this attribute gets a short name. The footprint will reduce however but will be dominated by the size of number of variants.

One other question, have you also played with small attributes with long names (the inverse of the first case)?

Isn't it possible to use the 'Get attributes' without a name, but with a type (thus eliminating the 'variant to data' for the first code)? I think it is possoble, or should be in the LabVIEW Idea Exchange.

Ton

Posted

I guess what I'm really going for is it would be good if there were a way to index variant attributes. YES, I know if you regularly need to enumerate your attributes you're doing it wrong, but there are some uses which you can't get away from. Serialization comes to mind. Any thoughts on this?

How about if instead bending a variant into doing somthing like that we get real data structures and iterators instead?

(Yeah, yeah, it's on my list of things to do for LapDog...)

Posted

But there's still a problem here. We're allocating an entire array of names. This can be verified if we redo the tests using large name strings as well. Using the same as above, 1000 attributes each with 100 kB values, but this time each name is also 100 kB in size, the memory footprints change to approximately 200 and 100 MB respectively. That is the second method still requires a lot of memory by virtue of the array of names it must use to operate.

While the above situation is largely academic (screw 128-bit GUIDs, I'm going with 819200-bit ones!), it can come into play if your lists are large by virtue of the number of elements (opposed to the size of elements). In this case method "One" might prove the better way of doing it.

I guess what I'm really going for is it would be good if there were a way to index variant attributes. YES, I know if you regularly need to enumerate your attributes you're doing it wrong, but there are some uses which you can't get away from. Serialization comes to mind. Any thoughts on this? I'll probably take this over to the idea exchange in a bit, just wanted to seed some discussion first.

You could create and maintain a parallel variant hash that indexes your original hash. The attribute names could literally be "1", "2", etc. (or just typecast your enumerators to strings) and the corresponding values would be the attribute names of your original hash. It's a bit of extra bookkeeping, but this is a fairly specialized case anyways. If you want, you could add this extra hash when certain criteria are met (e.g., # of elements, size of attribute names). Either way, you can continue to use method two without exposing yourself to large memory footprints.

Posted

Thanks folks. Sorry for the reply, having a heck of a time connecting to lava from work recently, so my response is delayed...

I can see your point, however how likely is it that you don't know by forehand...

The programmer would almost certainly know ahead of time whether their data size would be dominated by the names or values. Knowing which method to use really isn't the issue for most I'd expect.

Isn't it possible to use the 'Get attributes' without a name, but with a type (thus eliminating the 'variant to data' for the first code)? I think it is possible, or should be in the LabVIEW Idea Exchange.

Most definitely not. If a type is supplied, a name must be as well.

There are definitely games that we can play to reduce the allocation size when creating the arrays by having nested data structures etc. But it still demands an allocation each time I want to perform an enumeration of my variant attributes. It seems like an unnecessary waste.

I'd expect the variant attributes are implemented in some form of a binary tree for lookup. I don't know a lot about how to enumerate these data structures, but my guess is whatever they've chosen to use under the hood makes it non-trivial and is probably why we can't do it as it stands. Variants did get a big "kick in the pants" recently (to quote an NI employee). I know lookup performance of attributes improved greatly, so I wonder if it might be possible to easily allow enumerations now?

Posted

I'd expect the variant attributes are implemented in some form of a binary tree for lookup.

I believe I read somewhere they use a red-black tree.

Posted

Maybe a simple Variant Attribute Map is the wrong structure for this type of application?

I assume you will be packing this inside some kind of library anyway, so you can use simple accessor VIs with an internal stucture that is a bit more complicated.

What about this idea:

All items go into a 1D-array (either of Variant or of a certain data type that fits your application).

There is also a Variant with attributes. The attribute names are used as the identifiers, the attribute values are simple i32 array indices that point to the correct locations in the 1D-array.

This would work fine as long as you don't have to delete any elements from the list.

If deleting is necessary, you would have to think about some creative way to handle this.

I could think about:

  • Only the Variant attribute is removed. The array item is replaced by an empty Variant (or similar). Easy to implement but it's a memory leak that could bring you into trouble if the application runs a very long time.
  • Once an element is deleted, all indices are corrected. This really means a lot of work and CPU load. I wouldn't want to do this.
  • Elements are removed as described in the first idea, but you also keep a list of the "empty" indices. If a new element is added, those indices are reused before new ones are created. I think this is the most elegant way to solve this.

With this kind of structure you could easily access your items using their names or indices, whatever fits better.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Unfortunately, your content contains terms that we do not allow. Please edit your content to remove the highlighted words below.
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use.