Optimizing Read time of TDMS files

eberaud · September 5, 2014

Hi,

I am investigating the possibility of using TDMS files as a kind of giant circular buffer that would be too big to fit in a live 2D array of some sort. Of course the other reason is to have those data saved for when the application restarts.

A single location in the application will be responsible for writing in the file. This would consist of a loop that writes either one or a few samples for all the channels at each iteration. I successfully achieved this with good performance by setting the data layout input of the Write function to Interleaved.

On the read side, few locations might need to access the files, but only on event, so this won't be a frequent operation. However it should still be fast enough since I don't want the user to wait several seconds before being able to visualize the data. My tests have revealed that this operation is slow when data are interleaved. Here are the details:

# Channels: 500 (all in one group and the file contains only this group)

# Samples for each channel contained in the file: 100 000

Data type: SGL floats (I'm not using Waveforms)

Read operation:

# Channel to read: 1

# Samples to read: all (count=-1 and offset=0)

The time to retrieve the data is 1700 ms. (3500 if using DBL, it's quite linear...)

If I generate the file with just one Write (feeding a 2D array) in Interleave mode, I also get 1700ms, so this doesn't depend on how the file is written at the first place.

If I generate the file with just one Write (feeding a 2D array) in Decimated mode, this time I get 7ms!!

It makes sense that the operation is faster since all the data to retrieve occupy a contiguous area on the hard drive.

My 2 questions are:

- Is there a way to keep Interleaved layout while optimizing - significantly - the Read performance?

- If not, i.e. if I need to switch to Decimated, how can I write one or a few samples for all channels at each operation (I haven't managed to achieve this so far).

I should mention that I did manage to optimize things a little bit by using the advanced API, setting the channels information, and reserving the file size, but this only reduced the read time by 12%.

Thank you for your help!

hooovahh · September 8, 2014

So what you saw (I assume) is fragmentation. And as you said if the data is all in one big block getting the data is pretty quick. But if sections have to be grabbed and concatenated it takes more time. Because of this when I deal with large amounts of TDMS data, I will periodically defrag the file. This helps make the majority of the file one large block of data, with then fragmentation after that one block. This does take time, and during that the file can't really be accessed. I mean it can but I think the resource is locked, or the data you read could be corrupt.

Writing the data in a way that performs the flush as seldom as possible will be the key. What this might mean for you is you have a circular buffer in memory, until it gets to a specific size, then flush it to the TDMS file.

I like this idea and I've never tried using a TDMS file for this but given the fact that the offset and length can be specified on a read, I think it would work pretty well. All of my circular buffers could be held in memory at once so I never had a need for this type of thing. Post some code if you have an example of what you are thinking.

eberaud · September 8, 2014

Thank you Hooovahh as always!

Here is a nicely presented VI I created to compare the different scenarios. An enum with 6 items defines how the file is being written. My program will always write samples for all the channels at once, no matter how many samples it writes. On the read side it's the opposite, only one or a few channels (one in this example vi) are retrieved, but a large number of samples is requested. This VI is made with LV2011.

So what you saw (I assume) is fragmentation.

Do you mean the file itself (like any file in Windows) is fragmented on the hard drive, or do you mean the layout of the values of the different samples for the different channels is fragmented within the content of the file?

Test TDMS.vi

Edited September 8, 2014 by Manudelavega

hooovahh · September 8, 2014

Do you mean the file itself (like any file in Windows) is fragmented on the hard drive, or do you mean the layout of the values of the different samples for the different channels is fragmented within the content of the file?

I mean the file on disk. Here is a document with some fun examples to run that describes it.

https://decibel.ni.com/content/docs/DOC-20522

The way the file is written can influence fragmenatation. One way to avoid it is to use the TDMS write function as seldom as possible, where maybe you write all of the data types at once. Say you have 10 channels that are doubles and then 5 that are timestamps, and then 10 that are doubles. If you can force your code to write the 20 channels of doubles using one write function, instead of a write for doubles, then a write for timestamps, then a right for doubles, you will be better off.

But even that is better than 10 writes one for each double channel, followed by 5 write one for each timestamp channel, followed by 10 writes one for each double channel.

But even that is better than writing each sample for each channel one at a time.

eberaud · September 9, 2014

Thank you for the link. It seems I already have the most efficient way according to this document, since I never separate channels in several write operations. I tried the defragmentation, but I found that it takes a very long time (like 3 minutes) and doesn't improve the read operation's performance enough to make it worth.

I also modified my test VI to measure the time more accurately and added extra cases so please take version 2.

I start being able to pinpoint where the optimizations I need might be by analyzing the attached table I populated from my tests.

Reservation:

Comparing 3&6 (or 3&9), we see that reserving the file size makes a huge difference for the Write operation when the file is going to be written often. It makes sense since LabVIEW doesn't need to keep requesting a new space to be allocated on the hard drive. It also optimizes the read operation (less fragmentation since the file size is reserved).

However if we compare 6&9 (or 4&7, or 5&8), it appears that reserving the full size is better for the read (again, less fragmentation I suppose) but significantly worse for the write, which I don't understand. Reserving only N samples instead of N*M gives better results for the writes.

Writing in blocs:

Comparing 5&6, we see that - not surprisingly - writing less often but with more data is more efficient for the writing time. However since the file was fully reserved, there is no difference on the read time!

Comparing 8&9, this time both the write and the read are optimized when writing less often, since this time the file was not fully reserved, so more writes led to more fragmentation.

Data layout:

Comparing 4&5 (or 7&8), we see that the data layout doesn't have an influence on the write operation, but the decimated layout significantly improves the read operation since all samples for only one channel are requested. I would expect the interleaved layout to be more efficient if I was requesting only one or a few samples but for all channels. I didn't test that since it is not the scenario that my application will run.

Additional note:

Tests 1&2 shows the results one gets when writing all data with a single write operation. Case 1 leads to a super optimized reading time of 12ms, but the write time is surprisingly bad compared to case 2, I don't understand why so far. Those 2 scenarios are irrelevant anyway since my application will definitely have to write periodically in the file.

I would conclude that for my particular scenario, reserving the file size, grouping the write operations, and using the decimated layout is the way to go. I still need to define:

- The size of the write blocs (N/B)

- The size of the reservation, since reserving the whole file leads to bad write performance.

Test TDMS v2.vi

Edited September 9, 2014 by Manudelavega

hooovahh · September 11, 2014

Wow that is some interesting results. I'm always interested in TDMS performance. Your timing function is a little flawed but the data is probably still close.

UI elements in LabVIEW are updated asynchronously. So to get an accurate measure of how long a function takes to operate, you shouldn't be writing or reading to any UI elements, or using any property nodes in parallel with a timing test. Also it is generally a good idea to turn off Automatic Error handling, and Debugging because these can affect time measurements. That being said I don't think this changes your results much if any.

eberaud · September 11, 2014

Thanks for those tips. I hope those tests and remarks can be valuable to others as well. I had never used TDMS before and it would have been valuable to me to find this kind of thread!

I'll post here if I have more findings to share.

eberaud · September 16, 2014

I'd appreciate some advice again about this new issue I've been having: I want to see if I can add and remove channels on the fly. One of the channel is always there: the timestamp (just a simple index in this example). This is inportant since I need to be able to align the data when I retrieve them (to be displayed in an XY graph).

To describe the issue I'm having, look at the attached image, that explains everything...

I'm attaching the VI I used (very simple, made in LV2011). Thank you!

Test TDMS change channels simple.vi

hooovahh · September 16, 2014

I know this issue and I sorta struggle with it. There are a few solutions, but all of them involve extra writes.

When you write Channel 1, you can write blank data to Channel 2. If this data is a double I recommend writing NaN. Same with when you write Channel 2 you'll need to write NaN to Channel 1.

Or you can have an index or time column for each channel. So Index 1 and Channel 1 get written together, and Index 2 and Channel 2 get written together.

Or you can take that a step further and have a group for each set of data that comes in a different rates. Lets say you have 5 channels at the same rate and another 2 at a different rate. The first 5 channels can share the same timestamp, and the next 2 can share a different one. In this case it seems like a good idea to have one index per group in the TDMS data.

EDIT: The problem you are having is one I have too, which is we are trying to treat TDMS files as a report format. When really it is meant to just store the data. Reports can be generated using this data, but as soon as you try to just use the TDMS as your final report, you will hit formatting limitations like this that can be overcome with extra work.

eberaud · September 17, 2014

All my channels stream at the same rates. It's just that the user might add a new channel or delete a channel at any time, and this shouldn't have any impact on the other existing channels. I will deal with that later, probably by starting a new TDMS file, since I already plan to spread the huge amount of data over several files anyway.

New headache:

In my application, the read and write operations use the same reference obtained from a single Open file. I let the Advanced Synchronous Write VI buffer the data (I don't control when the data is actually flushed to the file). Reading a property (like NI_ChannelLength) gives the value corresponding to the current situation (includes the data from the latest write even though the data hasn't yet been flushed in the file - I know it by opening the file in Excel). That's good. However the Advanced Synchronous Read operation only sees data that has been flushed in the file, it doesn't see the data from the latest write.

That's a big issue since I use NI_ChannelLength to compute the count and offset I want to feed to the Read, and the Read gives me bad data for the non-flushed samples. I do not get an End Of File error though, which shows that I'm not asking for non-existing samples...

Is anybody aware of that issue? I tried to force the flush with the corresponding VI, but this just wrote junk in the file...

Edit: If I ask for a short amount of samples (less or equal to the number of samples I write in the Write VI), then the Read is actually successful. It's only when I ask for an amount of data that would be spread over the flushed data and the buffered data that I get a bad reading. That means I need to perform 2 distinct read operations: one to retrieve flushed samples, one to retrieve buffered samples. The problem is: how does my program know what has been flushed and what hasn't?

Edit2: So actually it does it also if I ask for "old" data (that has been flushed a while ago) so it had nothing to do with "buffered data" versus "flushed data".

Bottom line: this happens when I retrieve data that have been written by different write operations. Look at the picture below to know what I mean.

If I use the standard API, it works! With the Advanced API I get the exact same behavior, no matter if I use Synchronous or Asynchronous mode.

I attached the VI I used for this test. I start to strongly suspect that I am misunderstanding the use of the Set Read Position VI... It actually doesn't guarantee that I will get only samples from the channel I wire in the channel name in input, does it?

Test TDMS Sync Async.vi

Edited September 18, 2014 by Manudelavega

hooovahh · September 19, 2014

I'm not sure if you've contacted NI on their forums yet, but I think they may be of more help.

eberaud · September 19, 2014

I did. The Application Engineer I had on the phone said she was also surprised by this behavior and is supposed to get back to me after she gets a hold of a coworker of hers who knows more about TDMS.

But long story short, I think I figured it out: The Advanced API is a low-level API that is more efficient than the standard one, but has less smarts built in it. So when using it to read a channel, we need to know the exact layout of the data in the file and perform several elemental reads and concatenate their outputs.

When setting the layout to Non-Interleaved (=Decimated), writing 4 samples for channel 1 and for channel 2 each time at each write would result in the layout described in the attached picture.

My first understanding was that the Set Next Read Position would configure the Advanced Read (no matter if Synchronous or Asynchronous) to return samples for the specified channel only. I was wrong. Actually Set Next Read Position only places a pointer that specifies the location of the first sample to read. Then the Advanced Read reads samples sequentially until it has read the number of samples corresponding to its Count input, without caring about the channel those samples belong to!

So by knowing the layout of the file (that depends on how many samples we write each time we call the Advanced Write), we can keep re-positioning this pointer and adjust the Count input of the Advanced Read. So let's say I need to retrieve 8 samples from channel 1, from sample no. 2 to sample no. 9 included. I need to perform 3 read operations:

a) Offset of Set Next Read Position = 2, Count of Advanced Read = 2 (gives me samples 2 to 3)

b) Offset of Set Next Read Position = 4, Count of Advanced Read = 4 (gives me samples 4 to 7)
c) Offset of Set Next Read Position = 8, Count of Advanced Read = 2 (gives me samples 8 to 9)

The attached VI demonstrates how that works.

tdms advanced synchronous read.vi

Test TDMS Read.vi

Edited September 19, 2014 by Manudelavega

Sign In

Optimizing Read time of TDMS files

Recommended Posts

eberaud

hooovahh

eberaud

hooovahh

eberaud

hooovahh

eberaud

eberaud

hooovahh

eberaud

hooovahh

eberaud

Join the conversation

Similar Content

Error -2507 occurred at TDMS Write

Reading Custom TDMS properties

Two 1d arrays as XY graphs save to TDMS

Error -2505 on TDMS Flush and Close

Choice of logging format/style [TDMS, SQLite, some combination?] 1 2

Browse

Activity

Important Information