Jump to content
shoneill

Saving data in TDMS

Recommended Posts

I'm currently investigating using TDMS as a data storage for a new measurement method.  In our routine, we sweep up to 3 outputs (with X, Y and Z points each) and record up to 24 channels so we have XxYxZx24 datapoints.

 

We create the following:

Up to X data points for 24 channels of data interleaved in the first dimension (multichannel 1D)

Up to Y times this first dimension (making the data multichannel 2D)

Up to Z times this second dimension (making the data multichannel 3D)

 

So in a sense, we create 4D data.

 

Trying to use our old method of storing the data in memory fails quickly when the number of steps in each dimension increases.  So we want to store them in TDMS files.  But looking at the files and trying to  imagine what read speed will be like, I'm unsure how to best store this data.  DO I need multiple TDMS files?  A single file? How to map the channels / dimensions to theinternal TDMS structure?

 

In a further step to my efforts, I would be investigating having the routine for retrieving any sub-set of this data (1D or 2D slices from any combination of dimensions but almost always one channel at a time.

 

Can anyone with more experience with TDMS files give some input and help a TDMS noob out?

Share this post


Link to post
Share on other sites

One other point to consider is if SQLite wouldn't be a better idea taking the high level of flexibility and efficiency we would be trying to achieve when visualising the data.

Share this post


Link to post
Share on other sites

Unless you need to run this on a target that does not have SQLite support I would use that (the performance of Shaun's API in particular is impressive).

TDMS is fine if you can write the data as large continuous blocks. If you need to save different groups of data frequently, in smaller write operations, I would use separate files for each group if using TDMS, otherwise the file will be too fragmented, and the read performance gets really really bad. - We use proprietary binary formats ourselves due to this, as we a) need to support different RT targets, b) frequently write small fragments of group data into one and the same file, and c) need to search for and extract time periods fast...(It is 1500x (!) times faster than the equivalent TDMS-solution).

Share this post


Link to post
Share on other sites

I'm less worried about file fragmentation, I should be able to write the data in more or less sensible chunks.

 

I'm more worried about how to get the data back I want.

 

I want to be able to request data for display by specifying which channel(s) and whether I want X vs Y or Y vs Z or Z vs X and so on.  Coupled with the display scale (max-min X) I want to be able to do a memory-efficient processing of the raw data before passing it back to be displayed.  This should help significantly reduce the memory footprint when dealing with large datasets (and large means up to 1GB).  We never need to display so much data so the actual decimation in this approach will be significant (although I'd prefer a max-min decimation).  My worry is how to manage reading from file to get the data into my decimation algorithm as efficiently as possible (both speed-wise and memory footprint-wise).


I'll have to benchmark them I suppose.  I looked at SQLite before and because I have very limited SQL experience, it's the queries and proper data structure I'm unsure about there.  Especially when dealing with custom data reading schemes, I have the feeling a SQL-like approach offers signifant benefits.

Share this post


Link to post
Share on other sites

Huge fan of TDMS over here, so personally I'd probably go with that, but I've heard good things with SQLite so that is probably an option.

 

With TDMS the write is very fast in just about all cases.  The read is where it can be less efficient.  As mentioned before file fragmentation is the biggest cause of long read and open times.  In my logging routines I would have a dedicated actor to logging, and among doing other things, it would periodically close, defrag, and re-open the file to help with this issue.  But if you write in decent sized chunks you might not have an issue.

 

There are probably lots of ways to write a 4D array to a TDMS file.  Obviously it is only supposed to be a 2D type of structure, where you have something like an Excel work sheet.  But just like Excel you can have another layer which is groups.  So here we have a way of logging a 3D array, where you have groups, channels, and samples.  How you decide to implement that 4th dimension is up to you.  You could have many groups, or many channels in a group.  Then your read routine you'd want to encapsulate that so as you said you request X vs Y and it takes care of where in the file it needs to read.  Another neat benefit of TDMS is the offset and length options on read.  So you can read chunks of the file if it is too large, or just as a way to be efficient if the software can only show you part of it at a time anyway.

 

Conceptualizing a 3D array of data can be difficult, let alone a 4D.  Regardless of file type an method, you are going to probably have a hard time knowing if it even works right.  I wanted to write a test but I can't tell if it works right because I'm using made up data, and am unsure if it even works.

Share this post


Link to post
Share on other sites

I'll have to benchmark them I suppose.  I looked at SQLite before and because I have very limited SQL experience, it's the queries and proper data structure I'm unsure about there.  Especially when dealing with custom data reading schemes, I have the feeling a SQL-like approach offers signifant benefits.

 

There is a benchmark in the SQLite API for LabVIEW with which you can simulate your specific row and column counts and an example of fast datalogging with on-screen display and decimation. The examples should give you a good feel whether SQLite is an appropriate choice.

 

Generally. If it is high speed streaming to disk (like video) I would say TDMS. Nothing beats TDMS for raw speed. For anything else; SQLite* :D

 

What is your expected throughput requirement?

Edited by ShaunR

Share this post


Link to post
Share on other sites

My application writes thousands or samples for approximately 1000 channels in a single group. The Read/Write operations can be a bit slow on a regular hard-drive, but we use SSD drives or Ramdisk, and then it works perfectly and at very high speed.

 

I'm a big fan of the TDMS now. Was tough to get around the Advanced API palette at the beginning, it took a bit of understanding...

Share this post


Link to post
Share on other sites

We are quite fans of TDMS here as well. Read speeds can definitely be an issue but as pointed out by Manu, SSD helps a lot. Also,we have not tested it yet but the 2015 version but the API now includes a "TDMS In Memory" palette which should offer very fast access if you need it in your application without having to install external tools such as "RAM Disk".

 

As an aside, another tool we really like for viewing TDMS files is Diadem. We use it mostly as an engineer tools as we've had issues with the reporting feature in the past. It is a LOT faster and easier to use than Excel when it comes time to crunching a lot of data and looking at many graphs quickly. Unfortunately, at the moment, it doesn't support display of 4D graphs but I posted a question on the NI Forum a question about a possible way to implement such a feature through scripts. We don't have the skills or time to do it internally at the moment but I would really like to know if anyone created such a function and wants to share it.

 

There is also a KB that you can look at here but I do not think that it will meet your requirement for 4D display.

Share this post


Link to post
Share on other sites

Just as an afterthought. SQLite supports RTree spatial access methods too ;) Maybe relevant to your particular use case.

Edited by ShaunR

Share this post


Link to post
Share on other sites

I note that the SQL code to do arbitrary planer cuts through a 3D cube seems relatively straightforward, with a single large table and a simple WHERE statement ("SELECT … WHERE ABS(1.2*X+0.3*Y+.9Z) < tolerance", for example).   So you should prototype that with a large dataset and see if the performance is sufficient.  Also, don’t neglect the 3D picture control for visualization.

Share this post


Link to post
Share on other sites

Hmm, my initial testing seemed not to bode too well for TDMS.  I was getting miserable write speeds.....  I was iterating through the data I wanted to write and appending new channels as required, creating new groups as required and writing point for point.  This yields terrible results.

 

I have since found the all-important "TDMS Set Channel Information" function which allows me to tell the TDMS function what I'm going to be writing which actually allows it to write in the most efficient way.  Seems to be the very important missing piece of my puzzle.

 

It's a much more involved thing than I was expecting and I find resources for really explaining how to get the best out of any given situation (how your data is received versus how you want it saved) rather lacking on the internet.  I suppose I'll have to just get my hands dirty and experiment.  I think I have a much better grasp of how to optimise things now.

 

Shane

Share this post


Link to post
Share on other sites

I suppose I'll have to just get my hands dirty and experiment.  I think I have a much better grasp of how to optimise things now.

Yeah I'm a big fan of TDMS and I still learn things every once in a while by experimenting.  One thing that helps is as you already noticed, writing chunks of data.  Basically calling the Write function as few times as possible.  If you are getting samples one at a time, put it into a buffer, then write when you have X samples.  Got Y channels of the same data type which get new data at the same rate?  Try writing X samples for Y channels in a 2D array.  I think writing all the data in one group at a time helps too but again that might have been a flawed test of mine.  I think it made for fragmented data, alternating between writing in multiple groups.  

 

Because of all of these best practices I usually end up writing an actor that takes care of the TDMS calls which can do things like buffer, periodic defrag, and optimized writing techniques.  A bit of a pain for sure when you are used to just write to text file appending data, but the benefits are worth it in my situations.

Share this post


Link to post
Share on other sites

One thing that helps is as you already noticed, writing chunks of data.  Basically calling the Write function as few times as possible.  If you are getting samples one at a time, put it into a buffer, then write when you have X samples.  Got Y channels of the same data type which get new data at the same rate?  Try writing X samples for Y channels in a 2D array.  I think writing all the data in one group at a time helps too but again that might have been a flawed test of mine.  I think it made for fragmented data, alternating between writing in multiple groups.  

 

You're right, writing in chunks reduces fragmentation and improves read/write performance.

 

However, you can let TDMS driver handle this for you instead of writing your own buffer code:

Share this post


Link to post
Share on other sites

 

However, you can let TDMS driver handle this for you instead of writing your own buffer code:

I never had good luck with this.  Maybe it was older versions of TDMS but it never seemed to work right.  I can try it again and see if it works right.

Share this post


Link to post
Share on other sites

I suppose I'll have to just get my hands dirty and experiment. 

 

I think that's the key. I spent a lot of time tinkering around but thanks to that I now have a good understanding of the TDMS API and how to optimize the R/W operations. If, like me, you do your own decimation after the read operation, there is a sweet spot where it starts being more efficient to read each sample you're interested in one by one instead of reading a big block and decimating it.

Share this post


Link to post
Share on other sites

Gah, problem time.

 

Our data requires the ability to pass back a running average at any time.  This is proving to be a bit difficult.

 

I'm able to save all of our "static" data into a TDMS at really good speeds with no fragmentation, so far so good.  I want to maintain a runninng average somewhere in the file and I thought I could pre-allocate a group for this and fill it with dummy data and then update (overwrite) by setting the file write pointer as required and overwriting the already written data with newly calculated data (Read, modify, write).  Problem is, setting the file pointer requires the file to have been opened via the advanced Open primitive.  If I do this, the "normal" functions don't seem to work.  We need a running average because some of our measurements last several hours and giving no feedback during this time is not cool.  As it is, generating the average when the full dataset is present is no problem, it's the running average I have trouble with.  The data required for this running average could run into several hundred megabytes, we're dealing with potentially very large datasets here.

 

I know this mixed-mode behaviour isn't what TDMS is supposed to do but does anyone have any smart ideas how to do this without having to utilise a temporary external file (and copy the results over when finished.  This requires my routine to get the data to be aware of this extra file and pull in the current averaged data when required.  More work that I was hoping for.....

Share this post


Link to post
Share on other sites

Is there a reason the running average needs to be in the file?  Just curious why you don't have a circular buffer in your program and calculate the average with that.  Even if you do really want it to be in a file for some reason, is there a reason you can't have two files?  It sounds like the running average is filled with dummy data anyway and could be saved in a temp location.

Share this post


Link to post
Share on other sites

Also, I would abandon the regular API and rely only on the Advanced API. The regular API lacks a lot of flexibility, it's only intended for very basic usage.

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.


  • Similar Content

    • By Aniket Gadekar
      Hello Network,
      I am writing array of timestamp in TDMS file. "TDMS Write.vi" generates an error after calling this VI as follows.
       
       
      Group Name = "DUT T1"
      Channel Names = "DUT T1_Time"
      Please let me know if anyone has any suggestions.
      BR
      Aniket
       
    • By Lipko
      Hi all!
      I'm new to the forum and I have a strange issue with reading TDMS custom properties with Labview.

      Creating user properties is working fine using TDMS Set Properties.vi, but I can't read them with TDMS Get Properties.vi. I can read the "standard" properties, and also I do see the properties in DiAdem (dataportal and using script) and also in Excel when I use TDM(s) importer. The property names are not listed when calling TDMS Set Properties.vi without the property name and data type terminals connected.   
      There is no simultaneous file reading or writing.
      I solved the problem with loading DiAdem and running a script, but that's very slow and also not all target machines have DiAdem installed (and no licence either, obviously).
      I also tried with property names such as User Properties\Device_ID, User_Properties/Device_ID in whatever combinations (I look for the property "Device_ID") without success.
      Thank you for any hints in advance!
    • By cpipkin
      Hello
      I am trying to save TDMS files that ideally contain the following:
      - 3 xy graphs (each containing two 1d arrays)
      - 1 waveform
      The problem i'm running into is that when I convert the xy graphs to waveforms, the x-axis is converted to time, which isn't real or useful to me. I've attached screenshots of what the XY graph should look like VS what it ends up looking like with the waveform.
       
      How to I make sure the x-axis is preserved so that I can save to TDMS?
       
      Edit: VI is included & pictures have been updated to better represent my code.
       

       
       

      TDMS Waveform Example.vi
    • By malocascio
      Hi all,
      I am supporting a legacy application in LV2010, running on a realtime PXI controller. The application is throwing occasional TDMS errors, typically -2505, when I do TDMS Flush or TDMS Close operations. The description of this error is simply "LabVIEW failed to write data to the TDMS file," which doesn't really tell me what happened. Every time I write or flush, the same quantity of data is being written, and most of the time, it operates as expected. After iterating for anywhere between 2 and 14 hours, though, it eventually throws the error.
      Does anyone know in more detail what this error means, and how to deal with it?
      Thanks!
    • By Christian Butcher
      Question: I'm trying to determine the 'best' way to structure my data when storing to disk. My data comes from a variety of different sensor types and with quite different rates - e.g. temperature data (currently) as a 1D array of temperatures and a timestamp [time, t1, t2, ..., tn] at maybe 1 Hz and analog waveform data from load cells at data rates ~O(kHz). 
      I also want to be able to read back data from previous experiments and replot on a similar graph. 
      Reading threads on this forum and at NI I'm uncertain if I'll be better pursuing a set of TDMS files, probably one per sensor type stored at the group/channel level, then at the end of an experiment, collating the TDMS files into one master file and defragmenting, or trying instead to write to a SQLite database. (I have nearly no experience using SQL, but no problem learning - drjdpowell's youtube video was already very informative.) An alternative possibility mentioned in a thread somewhere here was to write TDMS files, then store records on which files hold what data in what I understood to be a catalogue-style SQL database.
      Could anyone with a clearer idea/head than me comment on which avenues are dark tracks down which time will be easily lost in failed attempts, and which seem likely to be worth trying?
       
      Background: I'm currently rewriting some code I wrote last year based on the 'Continuous Measurement and Logging'  template/project. The logging in that case was writing to a single, binary file. Keeping my data format in line as I changed sensor arrangement became increasingly annoying and an ever expanding series of block diagrams lead me to start on the 'Actor Framework' architecture.
      I have some initial progress with setting up actors and generating some simulated data, passing it around and getting it from different formats onto a waveform or XY-graph (can be chosen by choice of child class to insert into a subpanel). I'm now looking to write the logging code such that I have basic implementations of several of the components before I try and write out all of the measurement components and so on - I already have a temperature measurement library for an LTC2983 based PCB and so can use that to test (with hardware) inputting 1D arrays, whilst I'm planning to use just the sine wave VIs to test waveform input.
      I'm not so far into this set of coding that anything is set in stone yet, and so I want to at least try and start off in the right direction. Whilst it seems likely changes to requirements or plans will require modifications to whatever I write, it would be nice if those weren't of the same magnitude as the entire (e.g. logging) codebase.
       
      Apologies for the somewhat verbose post.
×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use.