Jump to content

Optimizing Read time of TDMS files


Recommended Posts

Hi,

 

I am investigating the possibility of using TDMS files as a kind of giant circular buffer that would be too big to fit in a live 2D array of some sort. Of course the other reason is to have those data saved for when the application restarts.

 

A single location in the application will be responsible for writing in the file. This would consist of a loop that writes either one or a few samples for all the channels at each iteration. I successfully achieved this with good performance by setting the data layout input of the Write function to Interleaved.

 

On the read side, few locations might need to access the files, but only on event, so this won't be a frequent operation. However it should still be fast enough since I don't want the user to wait several seconds before being able to visualize the data. My tests have revealed that this operation is slow when data are interleaved. Here are the details:

 

# Channels: 500 (all in one group and the file contains only this group)

# Samples for each channel contained in the file: 100 000

Data type: SGL floats (I'm not using Waveforms)

 

Read operation:

# Channel to read: 1

# Samples to read: all (count=-1 and offset=0)

 

The time to retrieve the data is 1700 ms. (3500 if using DBL, it's quite linear...)

 

If I generate the file with just one Write (feeding a 2D array) in Interleave mode, I also get 1700ms, so this doesn't depend on how the file is written at the first place.

If I generate the file with just one Write (feeding a 2D array) in Decimated mode, this time I get 7ms!!

 

It makes sense that the operation is faster since all the data to retrieve occupy a contiguous area on the hard drive.

 

My 2 questions are:

- Is there a way to keep Interleaved layout while optimizing - significantly - the Read performance?

- If not, i.e. if I need to switch to Decimated, how can I write one or a few samples for all channels at each operation (I haven't managed to achieve this so far).

 

I should mention that I did manage to optimize things a little bit by using the advanced API, setting the channels information, and reserving the file size, but this only reduced the read time by 12%.

 

Thank you for your help!

 

Link to comment

So what you saw (I assume) is fragmentation.  And as you said if the data is all in one big block getting the data is pretty quick.  But if sections have to be grabbed and concatenated it takes more time.  Because of this when I deal with large amounts of TDMS data, I will periodically defrag the file.  This helps make the majority of the file one large block of data, with then fragmentation after that one block.  This does take time, and during that the file can't really be accessed.  I mean it can but I think the resource is locked, or the data you read could be corrupt.

 

Writing the data in a way that performs the flush as seldom as possible will be the key.  What this might mean for you is you have a circular buffer in memory, until it gets to a specific size, then flush it to the TDMS file.

 

I like this idea and I've never tried using a TDMS file for this but given the fact that the offset and length can be specified on a read, I think it would work pretty well.  All of my circular buffers could be held in memory at once so I never had a need for this type of thing.  Post some code if you have an example of what you are thinking.

Link to comment

Thank you Hooovahh as always!

 

Here is a nicely presented VI I created to compare the different scenarios. An enum with 6 items defines how the file is being written. My program will always write samples for all the channels at once, no matter how many samples it writes. On the read side it's the opposite, only one or a few channels (one in this example vi) are retrieved, but a large number of samples is requested. This VI is made with LV2011.

 

So what you saw (I assume) is fragmentation.

 

Do you mean the file itself (like any file in Windows) is fragmented on the hard drive, or do you mean the layout of the values of the different samples for the different channels is fragmented within the content of the file?

Test TDMS.vi

Edited by Manudelavega
Link to comment

Do you mean the file itself (like any file in Windows) is fragmented on the hard drive, or do you mean the layout of the values of the different samples for the different channels is fragmented within the content of the file?

I mean the file on disk.  Here is a document with some fun examples to run that describes it.

 

https://decibel.ni.com/content/docs/DOC-20522

 

The way the file is written can influence fragmenatation.  One way to avoid it is to use the TDMS write function as seldom as possible, where maybe you write all of the data types at once.  Say you have 10 channels that are doubles and then 5 that are timestamps, and then 10 that are doubles.  If you can force your code to write the 20 channels of doubles using one write function, instead of a write for doubles, then a write for timestamps, then a right for doubles, you will be better off.  

 

But even that is better than 10 writes one for each double channel, followed by 5 write one for each timestamp channel, followed by 10 writes one for each double channel.  

 

But even that is better than writing each sample for each channel one at a time.

Link to comment

Thank you for the link. It seems I already have the most efficient way according to this document, since I never separate channels in several write operations. I tried the defragmentation, but I found that it takes a very long time (like 3 minutes) and doesn't improve the read operation's performance enough to make it worth.

 

I also modified my test VI to measure the time more accurately and added extra cases so please take version 2.

 

I start being able to pinpoint where the optimizations I need might be by analyzing the attached table I populated from my tests.

 

Reservation:

Comparing 3&6 (or 3&9), we see that reserving the file size makes a huge difference for the Write operation when the file is going to be written often. It makes sense since LabVIEW doesn't need to keep requesting a new space to be allocated on the hard drive. It also optimizes the read operation (less fragmentation since the file size is reserved).

 

However if we compare 6&9 (or 4&7, or 5&8), it appears that reserving the full size is better for the read (again, less fragmentation I suppose) but significantly worse for the write, which I don't understand. Reserving only N samples instead of N*M gives better results for the writes.

 

Writing in blocs:

Comparing 5&6, we see that - not surprisingly - writing less often but with more data is more efficient for the writing time. However since the file was fully reserved, there is no difference on the read time!

 

Comparing 8&9, this time both the write and the read are optimized when writing less often, since this time the file was not fully reserved, so more writes led to more fragmentation.

 

Data layout:

Comparing 4&5 (or 7&8), we see that the data layout doesn't have an influence on the write operation, but the decimated layout significantly improves the read operation since all samples for only one channel are requested. I would expect the interleaved layout to be more efficient if I was requesting only one or a few samples but for all channels. I didn't test that since it is not the scenario that my application will run.

 

Additional note:

Tests 1&2 shows the results one gets when writing all data with a single write operation. Case 1 leads to a super optimized reading time of 12ms, but the write time is surprisingly bad compared to case 2, I don't understand why so far. Those 2 scenarios are irrelevant anyway since my application will definitely have to write periodically in the file.

 

I would conclude that for my particular scenario, reserving the file size, grouping the write operations, and using the decimated layout is the way to go. I still need to define:

 

- The size of the write blocs (N/B)

- The size of the reservation, since reserving the whole file leads to bad write performance.

post-14511-0-66861700-1410224918.png

Test TDMS v2.vi

Edited by Manudelavega
  • Like 1
Link to comment

Wow that is some interesting results.  I'm always interested in TDMS performance.  Your timing function is a little flawed but the data is probably still close.

 

UI elements in LabVIEW are updated asynchronously.  So to get an accurate measure of how long a function takes to operate, you shouldn't be writing or reading to any UI elements, or using any property nodes in parallel with a timing test.  Also it is generally a good idea to turn off Automatic Error handling, and Debugging because these can affect time measurements.  That being said I don't think this changes your results much if any.

Link to comment

I'd appreciate some advice again about this new issue I've been having: I want to see if I can add and remove channels on the fly. One of the channel is always there: the timestamp (just a simple index in this example). This is inportant since I need to be able to align the data when I retrieve them (to be displayed in an XY graph).

 

To describe the issue I'm having, look at the attached image, that explains everything...

 

I'm attaching the VI I used (very simple, made in LV2011). Thank you!

post-14511-0-04198200-1410827723.png

Test TDMS change channels simple.vi

Link to comment

I know this issue and I sorta struggle with it.  There are a few solutions, but all of them involve extra writes.

 

When you write Channel 1, you can write blank data to Channel 2.  If this data is a double I recommend writing NaN.  Same with when you write Channel 2 you'll need to write NaN to Channel 1.

 

Or you can have an index or time column for each channel.  So Index 1 and Channel 1 get written together, and Index 2 and Channel 2 get written together.

 

Or you can take that a step further and have a group for each set of data that comes in a different rates.  Lets say you have 5 channels at the same rate and another 2 at a different rate.  The first 5 channels can share the same timestamp, and the next 2 can share a different one.  In this case it seems like a good idea to have one index per group in the TDMS data.

 

EDIT:  The problem you are having is one I have too, which is we are trying to treat TDMS files as a report format.  When really it is meant to just store the data.  Reports can be generated using this data, but as soon as you try to just use the TDMS as your final report, you will hit formatting limitations like this that can be overcome with extra work.

  • Like 1
Link to comment

All my channels stream at the same rates. It's just that the user might add a new channel or delete a channel at any time, and this shouldn't have any impact on the other existing channels. I will deal with that later, probably by starting a new TDMS file, since I already plan to spread the huge amount of data over several files anyway.

 

New headache:

In my application, the read and write operations use the same reference obtained from a single Open file. I let the Advanced Synchronous Write VI buffer the data (I don't control when the data is actually flushed to the file). Reading a property (like NI_ChannelLength) gives the value corresponding to the current situation (includes the data from the latest write even though the data hasn't yet been flushed in the file -  I know it by opening the file in Excel). That's good. However the Advanced Synchronous Read operation only sees data that has been flushed in the file, it doesn't see the data from the latest write.

 

That's a big issue since I use NI_ChannelLength to compute the count and offset I want to feed to the Read, and the Read gives me bad data for the non-flushed samples. I do not get an End Of File error though, which shows that I'm not asking for non-existing samples...

 

Is anybody aware of that issue? I tried to force the flush with the corresponding VI, but this just wrote junk in the file...

 

Edit: If I ask for a short amount of samples (less or equal to the number of samples I write in the Write VI), then the Read is actually successful. It's only when I ask for an amount of data that would be spread over the flushed data and the buffered data that I get a bad reading. That means I need to perform 2 distinct read operations: one to retrieve flushed samples, one to retrieve buffered samples. The problem is: how does my program know what has been flushed and what hasn't?

 

Edit2: So actually it does it also if I ask for "old" data (that has been flushed a while ago) so it had nothing to do with "buffered data" versus "flushed data".

Bottom line: this happens when I retrieve data that have been written by different write operations. Look at the picture below to know what I mean.

 

If I use the standard API, it works! With the Advanced API I get the exact same behavior, no matter if I use Synchronous or Asynchronous mode.

 

I attached the VI I used for this test. I start to strongly suspect that I am misunderstanding the use of the Set Read Position VI... It actually doesn't guarantee that I will get only samples from the channel I wire in the channel name in input, does it?

post-14511-0-70130700-1411063073.png

Test TDMS Sync Async.vi

Edited by Manudelavega
Link to comment

I did. The Application Engineer I had on the phone said she was also surprised by this behavior and is supposed to get back to me after she gets a hold of a coworker of hers who knows more about TDMS.

 

But long story short, I think I figured it out: The Advanced API is a low-level API that is more efficient than the standard one, but has less smarts built in it. So when using it to read a channel, we need to know the exact layout of the data in the file and perform several elemental reads and concatenate their outputs.

 

When setting the layout to Non-Interleaved (=Decimated), writing 4 samples for channel 1 and for channel 2 each time at each write would result in the layout described in the attached picture.

My first understanding was that the Set Next Read Position would configure the Advanced Read (no matter if Synchronous or Asynchronous) to return samples for the specified channel only. I was wrong. Actually Set Next Read Position only places a pointer that specifies the location of the first sample to read. Then the Advanced Read reads samples sequentially until it has read the number of samples corresponding to its Count input, without caring about the channel those samples belong to!

 

So by knowing the layout of the file (that depends on how many samples we write each time we call the Advanced Write), we can keep re-positioning this pointer and adjust the Count input of the Advanced Read. So let's say I need to retrieve 8 samples from channel 1, from sample no. 2 to sample no. 9 included. I need to perform 3 read operations:

 

a) Offset of Set Next Read Position = 2, Count of Advanced Read = 2 (gives me samples 2 to 3)

b) Offset of Set Next Read Position = 4, Count of Advanced Read = 4 (gives me samples 4 to 7)
c) Offset of Set Next Read Position = 8, Count of Advanced Read = 2 (gives me samples 8 to 9)

 

The attached VI demonstrates how that works.

 

tdms advanced synchronous read.vi

Test TDMS Read.vi

post-14511-0-46336800-1411170311.png

Edited by Manudelavega
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use.