Jump to content

Does anyone use netCDF file format?


Recommended Posts

Posted (edited)

I'm developing, for a client, a wrapper for netCDF4, a format that used for large atmospheric-science data sets that is implemented on top of HDF5.  Seems quite nice so far, and I was wondering if anyone had any experience with it?  Or HDF5 in general (which I have never used)?  It's kind of TDMS-like, but with n-dimension arrays.

Edited by drjdpowell
Posted

HDF5 is used quite widely in big facilities based research (synchrotrons, neutron sources and such like). It's a format that supports a virtual directory structure that contains meta-data attributes, and multi-dimensional array data sets). Although it's possible to browse and discover the names and locations of all of the data in the file, it's generally easier if you have some idea of where in the virtual file-system the data thatr you are interested in is being kept. There;s a couple of LabVIEW packages out there that provide an interface to HDF files and read and write native LabVIEW data types - they work well enough in my experience, although I haven't personally used HDF5 files in anything other thn proof of concept code with LabVIEW.

Posted

I've used HDF5 in two situations (so far) for quickly capturing large 3D imaging datasets (XYZ or XYt) - I've used the open source h5labview package which has worked very well for me.  The other LabVIEW package (not open source) that wraps HDF5 is Live HDF5, but unfortunately the two do not play nicely together, so it's not possible to install both at the same time to evaluate.  The HDF5 libraries are mostly wrapped directly, with a small interface dll just for handling some of the memory allocation issues.

HDF5 is similar to TDMS, but much more flexible, and not just in array dimension - H stands for Hierarchical, which basically means anything can go anywhere - data, attributes (metadata), groups, ... - even seamlessly across multiple files.  What I think is most flexible about it is that you can have different arrangements of your data in memory and on disk, and HDF5 handles the mapping between.  Needless to say that can also get very complicated, so I tend to stay with fairly simple mappings!  Used simply, it's straight-forward, but there's a lot of power underneath.  I've said more than once that I wish NI had built their data format on top of HDF5 rather than developing their own proprietary format - in the way that Matlab now uses HDF5 as the internal format for .mat files.

Posted

So, no one has used netCDF then?  I'm trying to see if there would be any interest in a netCDF LabVIEW API.   The API I've made is incomplete, as I can only justify implementing the minimum needed for the client who's paying for it.   Seems like it's better to just use HDF5 directly, unless one already uses netCDF (which I think is mainly just the Atmospheric Science community).  

Posted
On 19/12/2016 at 8:11 PM, GregSands said:

 The HDF5 libraries are mostly wrapped directly, with a small interface dll just for handling some of the memory allocation issues.

Just a side comment, but I think it is better to avoid a middle-layer dll if at all possible.  I have yet to find a memory allocation issue that could not be solved with LabVIEW.exe:MoveBlock.

Posted

The wrapper also ensures that if you hit abort, any open files are closed automatically through the callback system. How would you accomplish that without some middle layer?

Posted (edited)

No I've not used netCDF.  I presume that if netCDF uses HDF5, then it is just a particular file layout (combination of dataspaces and attributes) that ensure that all the required information is stored in a well-defined place.  I agree that using/supporting HDF5 would be of more use to more people.

The h5labview code is here (<500 lines) if you wanted to see what few things it is doing.  It's all open-source, so I'm sure Martijn would be appreciative of any extra development.  From the FAQ :

Quote

Ideally, work done in the DLL should be limited to that which cannot be done in LabVIEW (global DLL objects, function callbacks), but in practice any code that becomes messy to implement in LabVIEW but easy in C should be put in the helper DLL. So far this is limited to error handling and raw data/type manipulation.

 

Edited by GregSands
Posted (edited)
On 12/22/2016 at 0:24 PM, drjdpowell said:

Just a side comment, but I think it is better to avoid a middle-layer dll if at all possible.  I have yet to find a memory allocation issue that could not be solved with LabVIEW.exe:MoveBlock.

While the middle-layer is indeed an extra hassle, since you have to compile a shared library for every platform you want to support, it is for many cases still a lot easier than trying to play C compiler yourself on the LabVIEW diagram. Especially since not all LabVIEW platforms are equal in that respect (with 32 bit and 64 bit being one but by far not the only possible obstacle). Yes you can use conditional compile structures in LabVIEW to overcome this problem too, but at this point I really feel like using duct tape to hold the Eiffel tower together. Maintenance of such a VI library is a nightmare in the long run.

Not to forget about performance. If you use a middle layer shared library you can often directly use the LabVIEW datatype buffers to pass to the lower layer library functions, with MoveBlock you often end up copying any and every data back and forth multiple times.

And smithd points out another advantage of a middle layer. You can make sure that all the created objects are properly deallocated on a LabVIEW abort. Without that the whole shenanigan is staying lingering in memory until you close LabVIEW completely, possibly also keeping things like file locks, named OS pipes, OS events and semaphores alive that prevent you from rerunning the software again.

Edited by rolfk
Posted (edited)
4 hours ago, rolfk said:

Maintenance of such a VI library is a nightmare in the long run.

I have found this not to be the case especially if the alternative is a statically linked "middle layer" where you have to rely on the developer to release a new one whenever the other libraries are updated. I've found conditional statements a superior solution when the main libraries are already supplied by the developers or operating system.

4 hours ago, rolfk said:

You can make sure that all the created objects are properly deallocated on a LabVIEW abort.

We need a definitive guide to using the "Instance Data Pointer" which can alleviate, if not remove, this.

Edited by ShaunR
Posted
18 hours ago, rolfk said:

Abort() obviously will be called by LabVIEW when the user aborts the VI hierarchy.

Nice write up.

I was going to write some examples but for the life of me I couldn't think of one real world problem that it solves :wacko:. I keep looking at those functions and coming back to this every couple of years in case I've missed something but every time get stumped by by the per node instance nature and being unable to pass a parameter into it  Most modern APIs use opaque objects/structures and it is these we need to clear up rather than the function call instance. I guess it is meant for managing thread safety but we are concerned with a purely IDE event so we can unload a resource as the final operation. It is a design-time problem alone.

The classic requirement is to prevent error 5 when aborting a SQLite query and requiring a restart of LabVIEW to close the handle. I can do this by installing a "monitor" into the IDE but it's an awful solution. I can't think of any way to utilise these features for that use case without an intermediary - you can't even [object] reference count :(.

Posted

Interesting discussion.

My bias against a middle C layer is partly related to my lack of C coding skills.  I can use a third-party library with the confidence that I can fix bugs or add features to the LabVIEW part of it, and with some confidence that the underlying dll (such as HDF5) is widely used and thus probably well tested and near complete.  But issues with the middle layer leave me stuck.  

It also matters how complex the library being wrapped is.   My understanding is that netCDF is a simpler API than HDF5 (with the tradeoff that it cannot do everything that HDF5 can do), and I haven’t found the LabVIEW code needed to call it directly is that complicated.

Regarding the Unreserve() and Abort() callbacks, I would much rather be able to register a VI callback when a VI hierarchy goes idle.  It could then do any cleanup actions like closing things.  Perhaps I should suggest this to NI.  I can, and sometimes do, use such cleanup functions using the “watchdog” asynchronous actions in Messenger Library, but I can’t add Messenger Library as a dependency for a reusable wrapper library (also, asynchronous watchdogs are problematic compared to synchronous callbacks due the possibility of race conditions).  

Posted

...but am definitely interested in other developer feedback.

Hi Martijn,

Perhaps we should start a new topic on your library; I’ve had a look at it and could make some comments/questions.  For example, why are you making your dll calls in the UI thread?  Is HDF5 not thread safe?  

— James

Posted

That means netCDF isn’t thread safe either.  In a previous wrapping of a non-thread-safe library, MDSplus, I used a semaphore to serialize access, rather than the UI thread.  

Posted

He did the same thing with a DVR in the zeromq code, I wonder if there isn't a different reason here (or maybe he came up with the DVR idea later).

Posted (edited)
7 hours ago, drjdpowell said:

That means netCDF isn’t thread safe either.  In a previous wrapping of a non-thread-safe library, MDSplus, I used a semaphore to serialize access, rather than the UI thread.  

That wont work, especially for recursive mutexes.

34 minutes ago, smithd said:

He did the same thing with a DVR in the zeromq code, I wonder if there isn't a different reason here (or maybe he came up with the DVR idea later).

Neither will that.

You cannot mitigate thread safety in LabVIEW. It has to be done in the library itself or an intermediary that can install the lock callbacks and handle them. Serialization != thread safety and you cannot guarantee that a CLFN will always use the same thread on each call unless it is the UI thread (because there is only one).

Edited by ShaunR
Posted
8 hours ago, ShaunR said:

You cannot mitigate thread safety in LabVIEW.

Depends what the issue is.  The MDSplus library I mentioned uses a global variable.  It doesn’t matter if function calls are made in different threads; it just matters that critical code sections that use that variable be protected from parallel access, and thus I used a semaphore for serialization.  So I may have been using “thread safety” imprecisely to include race condition issues.  What’s the proper term for a library that requires serialization but not use from a single OS thread?

Posted

especially for recursive mutexes

I used a mutex count for that, releasing the semaphore when the count reached zero.   That allowed me to call locking functions from inside other locking functions.

Posted
32 minutes ago, drjdpowell said:

Depends what the issue is.

No it doesn't.

32 minutes ago, drjdpowell said:

The MDSplus library I mentioned uses a global variable.  It doesn’t matter if function calls are made in different threads; it just matters that critical code sections that use that variable be protected from parallel access, and thus I used a semaphore for serialization.

I don't know this library but from your description it sounds like it is "thread safe" or single threaded. The developers will have (or should have) made a statement about it somewhere.

43 minutes ago, drjdpowell said:

What’s the proper term for a library that requires serialization but not use from a single OS thread?

Well. Serialization means writing binary as strings to me (like the flatten functions,XML etc). So I'm already floundering with your question. Synchronisation maybe? But thread safety isn't about one problem. It is an umbrella term for multiple undefined behaviours when using multiple threads, including race conditions, pointer overflows, memory dereferencing, counter overruns,  and lots of nasty stuff that will crash LabVIEW-probably randomly and when already deployed to the customer site.

LabVIEW uses thread pools and there is no guarantee that a node will use the same thread from the pool for every call of a particular node. Unfortunately (or fortunately, depending on your perspective). Most of the time LabVIEW tries to use the same thread if it is available, for performance reasons, so for the most part everything seems to work great - right up until it doesn't. So I hope you can see that it really doesn't matter what you do in the LabVIEW code with DVRs, semaphores and other native synchronisation methods. The only guaranteed solutions are to use a thread safe library, use the "Run In UI Thread"  or, if the developers have supplied a callback mechanism, use an intermediary library to mediate. 

Posted

Well. Serialization means writing binary as strings to me (like the flatten functions,XML etc). So I'm already floundering with your question. 

Serialization is the act of serializing, which is to make something into a serial form, which is a set of things one following another.  Serial is often contrasted with parallel.  Used in multiple contexts in computer equipment, not just in converting memory data structures into serial bytes.

Posted (edited)
2 hours ago, ShaunR said:

But thread safety isn't about one problem. It is an umbrella term for multiple undefined behaviours when using multiple threads, including race conditions, pointer overflows, memory dereferencing, counter overruns,  and lots of nasty stuff that will crash LabVIEW-probably randomly and when already deployed to the customer site.

All those issues are about parallel actions; they are not specific to OS “threads”.  Only thread-specific memory is something that would require LabVIEW to use the UI thread.  Otherwise one can serialize prevent parallel calls by any number of means.

Edited by drjdpowell
Posted
10 hours ago, drjdpowell said:

All those issues are about parallel actions; they are not specific to OS “threads”.  Only thread-specific memory is something that would require LabVIEW to use the UI thread.  Otherwise one can serialize prevent parallel calls by any number of means.

Well. The libraries state categorically that you need callbacks when using multithreading and give the code that is required for you to implement (I've just outlined why it is the case and why you can't in LabVIEW). I can only tell you what I know from successfully implementing quite a few different ones; I'm not here convince you.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Unfortunately, your content contains terms that we do not allow. Please edit your content to remove the highlighted words below.
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use.