Jump to content

Optimizing Read time of TDMS files

Recommended Posts



I am investigating the possibility of using TDMS files as a kind of giant circular buffer that would be too big to fit in a live 2D array of some sort. Of course the other reason is to have those data saved for when the application restarts.


A single location in the application will be responsible for writing in the file. This would consist of a loop that writes either one or a few samples for all the channels at each iteration. I successfully achieved this with good performance by setting the data layout input of the Write function to Interleaved.


On the read side, few locations might need to access the files, but only on event, so this won't be a frequent operation. However it should still be fast enough since I don't want the user to wait several seconds before being able to visualize the data. My tests have revealed that this operation is slow when data are interleaved. Here are the details:


# Channels: 500 (all in one group and the file contains only this group)

# Samples for each channel contained in the file: 100 000

Data type: SGL floats (I'm not using Waveforms)


Read operation:

# Channel to read: 1

# Samples to read: all (count=-1 and offset=0)


The time to retrieve the data is 1700 ms. (3500 if using DBL, it's quite linear...)


If I generate the file with just one Write (feeding a 2D array) in Interleave mode, I also get 1700ms, so this doesn't depend on how the file is written at the first place.

If I generate the file with just one Write (feeding a 2D array) in Decimated mode, this time I get 7ms!!


It makes sense that the operation is faster since all the data to retrieve occupy a contiguous area on the hard drive.


My 2 questions are:

- Is there a way to keep Interleaved layout while optimizing - significantly - the Read performance?

- If not, i.e. if I need to switch to Decimated, how can I write one or a few samples for all channels at each operation (I haven't managed to achieve this so far).


I should mention that I did manage to optimize things a little bit by using the advanced API, setting the channels information, and reserving the file size, but this only reduced the read time by 12%.


Thank you for your help!


Link to post
Share on other sites

So what you saw (I assume) is fragmentation.  And as you said if the data is all in one big block getting the data is pretty quick.  But if sections have to be grabbed and concatenated it takes more time.  Because of this when I deal with large amounts of TDMS data, I will periodically defrag the file.  This helps make the majority of the file one large block of data, with then fragmentation after that one block.  This does take time, and during that the file can't really be accessed.  I mean it can but I think the resource is locked, or the data you read could be corrupt.


Writing the data in a way that performs the flush as seldom as possible will be the key.  What this might mean for you is you have a circular buffer in memory, until it gets to a specific size, then flush it to the TDMS file.


I like this idea and I've never tried using a TDMS file for this but given the fact that the offset and length can be specified on a read, I think it would work pretty well.  All of my circular buffers could be held in memory at once so I never had a need for this type of thing.  Post some code if you have an example of what you are thinking.

Link to post
Share on other sites

Thank you Hooovahh as always!


Here is a nicely presented VI I created to compare the different scenarios. An enum with 6 items defines how the file is being written. My program will always write samples for all the channels at once, no matter how many samples it writes. On the read side it's the opposite, only one or a few channels (one in this example vi) are retrieved, but a large number of samples is requested. This VI is made with LV2011.


So what you saw (I assume) is fragmentation.


Do you mean the file itself (like any file in Windows) is fragmented on the hard drive, or do you mean the layout of the values of the different samples for the different channels is fragmented within the content of the file?

Test TDMS.vi

Edited by Manudelavega
Link to post
Share on other sites

Do you mean the file itself (like any file in Windows) is fragmented on the hard drive, or do you mean the layout of the values of the different samples for the different channels is fragmented within the content of the file?

I mean the file on disk.  Here is a document with some fun examples to run that describes it.




The way the file is written can influence fragmenatation.  One way to avoid it is to use the TDMS write function as seldom as possible, where maybe you write all of the data types at once.  Say you have 10 channels that are doubles and then 5 that are timestamps, and then 10 that are doubles.  If you can force your code to write the 20 channels of doubles using one write function, instead of a write for doubles, then a write for timestamps, then a right for doubles, you will be better off.  


But even that is better than 10 writes one for each double channel, followed by 5 write one for each timestamp channel, followed by 10 writes one for each double channel.  


But even that is better than writing each sample for each channel one at a time.

Link to post
Share on other sites

Thank you for the link. It seems I already have the most efficient way according to this document, since I never separate channels in several write operations. I tried the defragmentation, but I found that it takes a very long time (like 3 minutes) and doesn't improve the read operation's performance enough to make it worth.


I also modified my test VI to measure the time more accurately and added extra cases so please take version 2.


I start being able to pinpoint where the optimizations I need might be by analyzing the attached table I populated from my tests.



Comparing 3&6 (or 3&9), we see that reserving the file size makes a huge difference for the Write operation when the file is going to be written often. It makes sense since LabVIEW doesn't need to keep requesting a new space to be allocated on the hard drive. It also optimizes the read operation (less fragmentation since the file size is reserved).


However if we compare 6&9 (or 4&7, or 5&8), it appears that reserving the full size is better for the read (again, less fragmentation I suppose) but significantly worse for the write, which I don't understand. Reserving only N samples instead of N*M gives better results for the writes.


Writing in blocs:

Comparing 5&6, we see that - not surprisingly - writing less often but with more data is more efficient for the writing time. However since the file was fully reserved, there is no difference on the read time!


Comparing 8&9, this time both the write and the read are optimized when writing less often, since this time the file was not fully reserved, so more writes led to more fragmentation.


Data layout:

Comparing 4&5 (or 7&8), we see that the data layout doesn't have an influence on the write operation, but the decimated layout significantly improves the read operation since all samples for only one channel are requested. I would expect the interleaved layout to be more efficient if I was requesting only one or a few samples but for all channels. I didn't test that since it is not the scenario that my application will run.


Additional note:

Tests 1&2 shows the results one gets when writing all data with a single write operation. Case 1 leads to a super optimized reading time of 12ms, but the write time is surprisingly bad compared to case 2, I don't understand why so far. Those 2 scenarios are irrelevant anyway since my application will definitely have to write periodically in the file.


I would conclude that for my particular scenario, reserving the file size, grouping the write operations, and using the decimated layout is the way to go. I still need to define:


- The size of the write blocs (N/B)

- The size of the reservation, since reserving the whole file leads to bad write performance.


Test TDMS v2.vi

Edited by Manudelavega
  • Like 1
Link to post
Share on other sites

Wow that is some interesting results.  I'm always interested in TDMS performance.  Your timing function is a little flawed but the data is probably still close.


UI elements in LabVIEW are updated asynchronously.  So to get an accurate measure of how long a function takes to operate, you shouldn't be writing or reading to any UI elements, or using any property nodes in parallel with a timing test.  Also it is generally a good idea to turn off Automatic Error handling, and Debugging because these can affect time measurements.  That being said I don't think this changes your results much if any.

Link to post
Share on other sites

I'd appreciate some advice again about this new issue I've been having: I want to see if I can add and remove channels on the fly. One of the channel is always there: the timestamp (just a simple index in this example). This is inportant since I need to be able to align the data when I retrieve them (to be displayed in an XY graph).


To describe the issue I'm having, look at the attached image, that explains everything...


I'm attaching the VI I used (very simple, made in LV2011). Thank you!


Test TDMS change channels simple.vi

Link to post
Share on other sites

I know this issue and I sorta struggle with it.  There are a few solutions, but all of them involve extra writes.


When you write Channel 1, you can write blank data to Channel 2.  If this data is a double I recommend writing NaN.  Same with when you write Channel 2 you'll need to write NaN to Channel 1.


Or you can have an index or time column for each channel.  So Index 1 and Channel 1 get written together, and Index 2 and Channel 2 get written together.


Or you can take that a step further and have a group for each set of data that comes in a different rates.  Lets say you have 5 channels at the same rate and another 2 at a different rate.  The first 5 channels can share the same timestamp, and the next 2 can share a different one.  In this case it seems like a good idea to have one index per group in the TDMS data.


EDIT:  The problem you are having is one I have too, which is we are trying to treat TDMS files as a report format.  When really it is meant to just store the data.  Reports can be generated using this data, but as soon as you try to just use the TDMS as your final report, you will hit formatting limitations like this that can be overcome with extra work.

  • Like 1
Link to post
Share on other sites

All my channels stream at the same rates. It's just that the user might add a new channel or delete a channel at any time, and this shouldn't have any impact on the other existing channels. I will deal with that later, probably by starting a new TDMS file, since I already plan to spread the huge amount of data over several files anyway.


New headache:

In my application, the read and write operations use the same reference obtained from a single Open file. I let the Advanced Synchronous Write VI buffer the data (I don't control when the data is actually flushed to the file). Reading a property (like NI_ChannelLength) gives the value corresponding to the current situation (includes the data from the latest write even though the data hasn't yet been flushed in the file -  I know it by opening the file in Excel). That's good. However the Advanced Synchronous Read operation only sees data that has been flushed in the file, it doesn't see the data from the latest write.


That's a big issue since I use NI_ChannelLength to compute the count and offset I want to feed to the Read, and the Read gives me bad data for the non-flushed samples. I do not get an End Of File error though, which shows that I'm not asking for non-existing samples...


Is anybody aware of that issue? I tried to force the flush with the corresponding VI, but this just wrote junk in the file...


Edit: If I ask for a short amount of samples (less or equal to the number of samples I write in the Write VI), then the Read is actually successful. It's only when I ask for an amount of data that would be spread over the flushed data and the buffered data that I get a bad reading. That means I need to perform 2 distinct read operations: one to retrieve flushed samples, one to retrieve buffered samples. The problem is: how does my program know what has been flushed and what hasn't?


Edit2: So actually it does it also if I ask for "old" data (that has been flushed a while ago) so it had nothing to do with "buffered data" versus "flushed data".

Bottom line: this happens when I retrieve data that have been written by different write operations. Look at the picture below to know what I mean.


If I use the standard API, it works! With the Advanced API I get the exact same behavior, no matter if I use Synchronous or Asynchronous mode.


I attached the VI I used for this test. I start to strongly suspect that I am misunderstanding the use of the Set Read Position VI... It actually doesn't guarantee that I will get only samples from the channel I wire in the channel name in input, does it?


Test TDMS Sync Async.vi

Edited by Manudelavega
Link to post
Share on other sites

I did. The Application Engineer I had on the phone said she was also surprised by this behavior and is supposed to get back to me after she gets a hold of a coworker of hers who knows more about TDMS.


But long story short, I think I figured it out: The Advanced API is a low-level API that is more efficient than the standard one, but has less smarts built in it. So when using it to read a channel, we need to know the exact layout of the data in the file and perform several elemental reads and concatenate their outputs.


When setting the layout to Non-Interleaved (=Decimated), writing 4 samples for channel 1 and for channel 2 each time at each write would result in the layout described in the attached picture.

My first understanding was that the Set Next Read Position would configure the Advanced Read (no matter if Synchronous or Asynchronous) to return samples for the specified channel only. I was wrong. Actually Set Next Read Position only places a pointer that specifies the location of the first sample to read. Then the Advanced Read reads samples sequentially until it has read the number of samples corresponding to its Count input, without caring about the channel those samples belong to!


So by knowing the layout of the file (that depends on how many samples we write each time we call the Advanced Write), we can keep re-positioning this pointer and adjust the Count input of the Advanced Read. So let's say I need to retrieve 8 samples from channel 1, from sample no. 2 to sample no. 9 included. I need to perform 3 read operations:


a) Offset of Set Next Read Position = 2, Count of Advanced Read = 2 (gives me samples 2 to 3)

b) Offset of Set Next Read Position = 4, Count of Advanced Read = 4 (gives me samples 4 to 7)
c) Offset of Set Next Read Position = 8, Count of Advanced Read = 2 (gives me samples 8 to 9)


The attached VI demonstrates how that works.


tdms advanced synchronous read.vi

Test TDMS Read.vi


Edited by Manudelavega
Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

  • Similar Content

    • By Aniket Gadekar
      Hello Network,
      I am writing array of timestamp in TDMS file. "TDMS Write.vi" generates an error after calling this VI as follows.
      Group Name = "DUT T1"
      Channel Names = "DUT T1_Time"
      Please let me know if anyone has any suggestions.
    • By Lipko
      Hi all!
      I'm new to the forum and I have a strange issue with reading TDMS custom properties with Labview.

      Creating user properties is working fine using TDMS Set Properties.vi, but I can't read them with TDMS Get Properties.vi. I can read the "standard" properties, and also I do see the properties in DiAdem (dataportal and using script) and also in Excel when I use TDM(s) importer. The property names are not listed when calling TDMS Set Properties.vi without the property name and data type terminals connected.   
      There is no simultaneous file reading or writing.
      I solved the problem with loading DiAdem and running a script, but that's very slow and also not all target machines have DiAdem installed (and no licence either, obviously).
      I also tried with property names such as User Properties\Device_ID, User_Properties/Device_ID in whatever combinations (I look for the property "Device_ID") without success.
      Thank you for any hints in advance!
    • By cpipkin
      I am trying to save TDMS files that ideally contain the following:
      - 3 xy graphs (each containing two 1d arrays)
      - 1 waveform
      The problem i'm running into is that when I convert the xy graphs to waveforms, the x-axis is converted to time, which isn't real or useful to me. I've attached screenshots of what the XY graph should look like VS what it ends up looking like with the waveform.
      How to I make sure the x-axis is preserved so that I can save to TDMS?
      Edit: VI is included & pictures have been updated to better represent my code.


      TDMS Waveform Example.vi
    • By malocascio
      Hi all,
      I am supporting a legacy application in LV2010, running on a realtime PXI controller. The application is throwing occasional TDMS errors, typically -2505, when I do TDMS Flush or TDMS Close operations. The description of this error is simply "LabVIEW failed to write data to the TDMS file," which doesn't really tell me what happened. Every time I write or flush, the same quantity of data is being written, and most of the time, it operates as expected. After iterating for anywhere between 2 and 14 hours, though, it eventually throws the error.
      Does anyone know in more detail what this error means, and how to deal with it?
    • By Christian Butcher
      Question: I'm trying to determine the 'best' way to structure my data when storing to disk. My data comes from a variety of different sensor types and with quite different rates - e.g. temperature data (currently) as a 1D array of temperatures and a timestamp [time, t1, t2, ..., tn] at maybe 1 Hz and analog waveform data from load cells at data rates ~O(kHz). 
      I also want to be able to read back data from previous experiments and replot on a similar graph. 
      Reading threads on this forum and at NI I'm uncertain if I'll be better pursuing a set of TDMS files, probably one per sensor type stored at the group/channel level, then at the end of an experiment, collating the TDMS files into one master file and defragmenting, or trying instead to write to a SQLite database. (I have nearly no experience using SQL, but no problem learning - drjdpowell's youtube video was already very informative.) An alternative possibility mentioned in a thread somewhere here was to write TDMS files, then store records on which files hold what data in what I understood to be a catalogue-style SQL database.
      Could anyone with a clearer idea/head than me comment on which avenues are dark tracks down which time will be easily lost in failed attempts, and which seem likely to be worth trying?
      Background: I'm currently rewriting some code I wrote last year based on the 'Continuous Measurement and Logging'  template/project. The logging in that case was writing to a single, binary file. Keeping my data format in line as I changed sensor arrangement became increasingly annoying and an ever expanding series of block diagrams lead me to start on the 'Actor Framework' architecture.
      I have some initial progress with setting up actors and generating some simulated data, passing it around and getting it from different formats onto a waveform or XY-graph (can be chosen by choice of child class to insert into a subpanel). I'm now looking to write the logging code such that I have basic implementations of several of the components before I try and write out all of the measurement components and so on - I already have a temperature measurement library for an LTC2983 based PCB and so can use that to test (with hardware) inputting 1D arrays, whilst I'm planning to use just the sine wave VIs to test waveform input.
      I'm not so far into this set of coding that anything is set in stone yet, and so I want to at least try and start off in the right direction. Whilst it seems likely changes to requirements or plans will require modifications to whatever I write, it would be nice if those weren't of the same magnitude as the entire (e.g. logging) codebase.
      Apologies for the somewhat verbose post.
  • Create New...

Important Information

By using this site, you agree to our Terms of Use.