Choice of logging format/style [TDMS, SQLite, some combination?]

Christian Butcher · October 18, 2016

Question: I'm trying to determine the 'best' way to structure my data when storing to disk. My data comes from a variety of different sensor types and with quite different rates - e.g. temperature data (currently) as a 1D array of temperatures and a timestamp [time, t1, t2, ..., tn] at maybe 1 Hz and analog waveform data from load cells at data rates ~O(kHz).

I also want to be able to read back data from previous experiments and replot on a similar graph.

Reading threads on this forum and at NI I'm uncertain if I'll be better pursuing a set of TDMS files, probably one per sensor type stored at the group/channel level, then at the end of an experiment, collating the TDMS files into one master file and defragmenting, or trying instead to write to a SQLite database. (I have nearly no experience using SQL, but no problem learning - drjdpowell's youtube video was already very informative.) An alternative possibility mentioned in a thread somewhere here was to write TDMS files, then store records on which files hold what data in what I understood to be a catalogue-style SQL database.

Could anyone with a clearer idea/head than me comment on which avenues are dark tracks down which time will be easily lost in failed attempts, and which seem likely to be worth trying?

Background: I'm currently rewriting some code I wrote last year based on the 'Continuous Measurement and Logging' template/project. The logging in that case was writing to a single, binary file. Keeping my data format in line as I changed sensor arrangement became increasingly annoying and an ever expanding series of block diagrams lead me to start on the 'Actor Framework' architecture.

I have some initial progress with setting up actors and generating some simulated data, passing it around and getting it from different formats onto a waveform or XY-graph (can be chosen by choice of child class to insert into a subpanel). I'm now looking to write the logging code such that I have basic implementations of several of the components before I try and write out all of the measurement components and so on - I already have a temperature measurement library for an LTC2983 based PCB and so can use that to test (with hardware) inputting 1D arrays, whilst I'm planning to use just the sine wave VIs to test waveform input.

I'm not so far into this set of coding that anything is set in stone yet, and so I want to at least try and start off in the right direction. Whilst it seems likely changes to requirements or plans will require modifications to whatever I write, it would be nice if those weren't of the same magnitude as the entire (e.g. logging) codebase.

Apologies for the somewhat verbose post.

smithd · October 18, 2016

TDMS files sound like a reasonable choice for that kind of data, especially at low rates, but so does sqlite.

Seems like the pros of TDMS are you can hand it off to someone and they can open it in excel, but for sqlite you need to make a reader. TDMS also is a bit more flexible about schema. While you wouldn't want to change channel counts within a file for example, you can change between files without much work. For a database you'd need to come up with a schema that makes sense for you (this doesn't mean its complicated, it probably just means you have the columns "time", "channel number" and "value"). Sqlite lets you query easily for different conditions, while for tdms you'd have to write all the logic yourself for queries. Neither of the cons is particularly bad for the situation you've described.

Your last point ( write TDMS files, then store records on which files hold what data ) is basically what diadem/datafinder do for tdms, as I understand them. So depending on your particular work environment, you may already have access to those tools which may save you some time.

Christian Butcher · October 18, 2016

Well, that sounds good. Either option sounds plausible from your post and the only thing I should avoid doing is investing time in a catalogue, since I can use Diadem to do that for me. Good to know!

Regarding the plotting of decimated data, my previous approach was to use a data queue for logging and have decimated data passed by notifier to a graph, so the data was decimated by a fixed factor at acquisition. This seems inherently inflexible, but certainly is easy to implement (both for QMH with Continuous Measurement/Logging, and also for Actor Framework - I can just send different data via message to logging and graphing actors).

The SQLite examples demonstrated for the SQLite Library, and presumably also for other implementations of the C interface, show very nicely the use of SELECT statements to plot only some subset of data. My understanding is that I can use the MAX and MIN SQL statements (probably statement is the wrong term here. Maybe function?) to create a min/max decimation step.

If I want to do similar things for TDMS logged data, am I limited to holding a complete data set within LabVIEW's memory and passing only chosen data points to the graph when I want to update, or using fixed decimation regardless of e.g. graph zoom level or focus? I assume that reading from a TDMS file that is currently open for writing is likely to end badly for me...

hooovahh · October 18, 2016

Actually TDMS is quite robust. I'm not saying you can't break it, but I had a few tests where multiple references were opened and closed in parallel, and reading and writing took place in an uncontrolled way, and there wasn't any data lost on the writes. This suggests to me that there is a central driver that all API calls go into which handles the reading and writing. This isn't to say data loss can't happen in odd cases of multiple reads/writes in parallel. These cases should be avoided. I just wanted to mention that in my experience it is quite robust.

The TDMS read has an offset and range for performing a read. So you can graph a subset of the data without having to read all of it. But for something like decimate you will likely need to read all values of the range you want, and then decimate it in LabVIEW, which is likely less memory efficient than a database call that has that built into the select statement.

Also getting something like a Min or Max in TDMS can be less efficient than a database call, if you have open ended queries. By that I mean if you know you want to have the min and max of all channels, then when you are performing the writing of those channels, you can also write properties of the channel that is the min and max. Then when you are reading the file these properties are already written in the file and can be read without having to read every data point. These are sorta like queries where the result is written in the file, knowing that you may want to look for them later. For queries you know you'll want to do this is great, but if you don't know what you are going to want to look for in a post-processing way, then you will need to read all values, and calculate the result. For me each channel, and each sample has a Pass or Fail associated with it, and I want to know if a test passed in every cycle, meaning that every sample of the channel was a pass. Instead of reading every sample, I just write a single Pass or Fail property on the entire channel. If that is Pass then I know every sample of that test passed. I can also combine this and look at if every test has this passed property, and if it does then write a Pass property on the entire file. Then I just need to read this one property to know if the entire file passed.

I knew I would want this query like information available, so I wrote it into the program that writes the TDMS files. If I now wanted to know every time a sample was above 0.5, I would have to write custom code to read every sample and get the results, because that isn't a property I wrote into the application.

smithd · October 19, 2016

22 hours ago, Christian Butcher said:

If I want to do similar things for TDMS logged data, am I limited to holding a complete data set within LabVIEW's memory and passing only chosen data points to the graph when I want to update, or using fixed decimation regardless of e.g. graph zoom level or focus? I assume that reading from a TDMS file that is currently open for writing is likely to end badly for me...

There is an example finder example of reading and writing from a single file. I think its called 'concurrent tdms' or something. 2015 or 2016 also added in-memory TDMS which can then be flattened to a file at the end, so that might work for you too.

As for memory usage, keep in mind memory isn't expensive these days, and an analog value can most of the time be stored as a 4-byte sgl without losing useful data *. (I don't have verification of this anywhere, but at one point a coworker went through every c-series module and concluded that there are no modules for which you need to use a dbl float -- ie replace "useful data" with "any data" above). If possible, it is always easiest to just keep all your data in memory. The biggest challenge is that labview somehow still isn't entirely 64-bit, but if you're not using rt/fpga I think you can get away with it these days...gigs and gigs of data to work with.

Christian Butcher · October 19, 2016

16 hours ago, hooovahh said:

Actually TDMS is quite robust. I'm not saying you can't break it, but I had a few tests where multiple references were opened and closed in parallel, and reading and writing took place in an uncontrolled way, and there wasn't any data lost on the writes.

This is both great to know and also seems to follow from the example that smithd describes (Concurrent Access to TDMS File)

16 hours ago, hooovahh said:

The TDMS read has an offset and range for performing a read. So you can graph a subset of the data without having to read all of it. But for something like decimate you will likely need to read all values of the range you want, and then decimate it in LabVIEW, which is likely less memory efficient than a database call that has that built into the select statement.

Yeah, this is my fear. But perhaps if I'm careful to avoid obscene data rates or overly precise storage, I can cut down the cost. Of course, I should also take care not to make 'avoid obscene data rates' or 'use only single precision data' requirements...

16 hours ago, hooovahh said:

These are sorta like queries where the result is written in the file, knowing that you may want to look for them later. For queries you know you'll want to do this is great, but if you don't know what you are going to want to look for in a post-processing way, then you will need to read all values, and calculate the result.

[snip]

I knew I would want this query like information available, so I wrote it into the program that writes the TDMS files. If I now wanted to know every time a sample was above 0.5, I would have to write custom code to read every sample and get the results, because that isn't a property I wrote into the application.

More good things to consider. However, I expect I'm bound to come up with something new to look at fairly frequently, and so I'll have to consider most things to be future calculations if I do this. Then again, I can store the results in the same file once I've calculated them, if I go back to them later, right?

3 hours ago, smithd said:

There is an example finder example of reading and writing from a single file. I think its called 'concurrent tdms'

Thanks for the pointer. This is very useful to know. I took a look at the example and certainly it seems worth examining in more detail.

3 hours ago, smithd said:

As for memory usage, keep in mind memory isn't expensive these days, and an analog value can most of the time be stored as a 4-byte sgl without losing useful data

Whilst looking into data requirements, I also concluded that perhaps saving only shorter representations would be fine, and that longer representations really only enhance my recording of noise... I suppose if I know that my experimental precision is limited to eg mV, then storing with a representation covering nV is not very useful.

3 hours ago, smithd said:

If possible, it is always easiest to just keep all your data in memory. The biggest challenge is that labview somehow still isn't entirely 64-bit, but if you're not using rt/fpga I think you can get away with it these days...gigs and gigs of data to work with.

I'll have to determine expected maximum runtime vs available memory to see how plausible this is, but as you say, also well worth examining.

ShaunR · October 19, 2016

6 hours ago, smithd said:

keep in mind memory isn't expensive these days

But memory [re]allocation is

Rule of thumb:

If you want to search the data, use SQLite. If you want high speed bulk data written to disk then use TDMS. If you want high speed data that is searchable then use both with just-in-time post processing.

drjdpowell · October 19, 2016

TDMS is much more specialised than the more general-purpose SQLite. If your use case is well within what it is designed for (and it sounds like it is) then it is likely the better choice.

ShaunR · October 19, 2016

38 minutes ago, drjdpowell said:

TDMS is much more specialised than the more general-purpose SQLite. If your use case is well within what it is designed for (and it sounds like it is) then it is likely the better choice.

Hmm. That's not really saying anything and a variant on "use the right tool for the job".

How about enumerating these alleged design cases for the OP and comparing the practicalities?. I'll start you off........

TDMS is designed for high speed disk writing and can write to disk at more than 400MB/s with the right disks. SQlite can only manage about 0.5-1MB/s.

hooovahh · October 19, 2016

8 hours ago, Christian Butcher said:

Then again, I can store the results in the same file once I've calculated them, if I go back to them later, right?

Yup you could do that. Properties are only scalar values, so keep that in mind too. So maybe I have a property that is total number of failed and total number of passed tests in the file. If I run more tests I can write those two properties again, overwriting the previous values those properties had. You can of course flatten any data type to a string and write that so the scalar limitation is only applicable for the native data types, and anything else won't be human readable in normal viewer.

eberaud · October 19, 2016

I've spent my fair share of time dealing with the Advanced TDMS API to achieve the same thing: decimating the data to plot them in a graph. My benchmarking showed that there is a sweet spot where performing one Read operation for each sample you want to keep (using the Offset hooovahh was talking about) starts being more efficient than performing a single Read operation returning a giant amount of data and then keeping only 1 every 100 samples for example. So I'd recommend looking into that if you go with a TDMS-based solution.

And since we are dealing with files on disk: SSD drive!! Forget about HDD, way too slow, your users will have to wait several seconds each time they need to refresh the graphs.

And yes, cut the file size by half by using SGL instead of DBL! This only poses an issue for the timestamp since LabVIEW uses DBL. I worked around this by using 3 SGL columns to represent the timestamp.

You're on the right track!

ShaunR · October 19, 2016

1 hour ago, Manudelavega said:

I've spent my fair share of time dealing with the Advanced TDMS API to achieve the same thing: decimating the data to plot them in a graph. My benchmarking showed that there is a sweet spot where performing one Read operation for each sample you want to keep (using the Offset hooovahh was talking about) starts being more efficient than performing a single Read operation returning a giant amount of data and then keeping only 1 every 100 samples for example. So I'd recommend looking into that if you go with a TDMS-based solution.

And since we are dealing with files on disk: SSD drive!! Forget about HDD, way too slow, your users will have to wait several seconds each time they need to refresh the graphs.

And yes, cut the file size by half by using SGL instead of DBL! This only poses an issue for the timestamp since LabVIEW uses DBL. I worked around this by using 3 SGL columns to represent the timestamp.

You're on the right track!

Are there bench marking graphs or data you can post?

I did a lot with SQLite - some of which you can see under the "Performance" heading. It might be nice for some graphical comparisons. SQLite should blow away TDMS for queries but TDMS should blow away SQLite for Writes. The data logging example in the API (which you can play with) demonstrates decimation and takes about 150ms to retrieve the 500 decimated datapoints when there are 1 million [DBL] data points in total (IIRC).

GregSands · October 19, 2016

I also use TDMS at times, and it has some other advantages, such as being able to write directly from DMAs (say from an FPGA) or DVRs, or configuring to write internally using DAQmx logging. Where the data is written and read only in LabVIEW or Diadem, it's a natural option, but it's not so suitable if other non-NI software is used for the data analysis - the connection to Excel and Matlab is limited at best, and for anything else you seem to be on your own, or needing to interface through another file format.

Another option which is more standardized is HDF5 - I'm a fan of the h5labview library, but there are other alternatives as well. HDF is extremely flexible around the hierarchical relationship between the data and the storage format (so much so that it can get quite convoluted if you want it to), and is at least as fast as TDMS for reading/writing. Some advantages include the ability to read (or write) with arbitrary offsets/steps, even in multi-dimensional arrays, and the ability to add attributes at any level of the storage (file, object, data group, data, etc). Downsides are just getting your head around the options especially with data spaces (a mapping between your data array format and the data file format) - in some ways it's too flexible, although basic use is quite straightforward. Here's an example image from the h5labview website:

I use HDF5 very successfully for large 4D image data, because it makes it easy to access just the data I want, but for 1D data streams I would tend towards TDMS as the first option. I must confess to only having used SQLite once for text logging - maybe I need more of a look at it too!

What I really wish is that NI had built its data storage formats around HDF5. There are few, if any, features of TDMS that could not be achieved with a file format compatible with HDF5. A similar idea has reached 55 kudos to date, which shows a decent level of support for a largely technical idea. Note that Matlab's .MAT file format is now HDF5.

Edited October 19, 2016 by GregSands

eberaud · October 19, 2016

3 hours ago, ShaunR said:

Are there bench marking graphs or data you can post?

Yes but I'll need a bit of effort to put together something that can be understood easily, since right now it mostly consists of big Excel sheets with raw data. I'll try to post something soon-ish.

Christian Butcher · October 20, 2016

Thanks again for all the advice.

7 hours ago, ShaunR said:

I did a lot [of bench marking graphs] with SQLite - some of which you can see under the "Performance" heading.

I took a look at the graphs I found on the left side of the page, under the Performance heading. It looked like you'd managed around 400k records inserted in 1 second, but for around 10 columns a time of about 0.75ms. Is this difference due to transaction grouping (no idea if that's the right term - I mean making multiple changes in one write)?

In the case of 400kHz, that's around 4 times higher that what I'd previously seen described in threads on this forum (around 100kHz) and what I calculated from your earlier description (0.5-1 MB/s -> ~65.5kHz-130kHZ assuming doubles inserted).

Obviously there are some presumably suspect assumptions here - perhaps when profiling you inserted 4B values, for example.

Quote

The data logging example in the API (which you can play with) demonstrates decimation and takes about 150ms to retrieve the 500 decimated datapoints when there are 1 million [DBL] data points in total (IIRC).

The comment on the bottom half of this image seems to exactly describe my worries. (Quoted below for forum readers unable/uninterested/unwilling to open the link and read the picture)

Quote

... But when zooming in we are unable to view the data that has not been graphed. To get around this problem we could maintain an array so that data could be recalled when zooming in and out. For large data-sets, this too becomes unweildy, sluggish and is extremely memory and CPU intensive.

8 hours ago, Manudelavega said:

I've spent my fair share of time dealing with the Advanced TDMS API to achieve the same thing: decimating the data to plot them in a graph. My benchmarking showed that there is a sweet spot where performing one Read operation for each sample you want to keep (using the Offset hooovahh was talking about) starts being more efficient than performing a single Read operation returning a giant amount of data and then keeping only 1 every 100 samples for example.

Is this (1%) an approximate order of magnitude? Or just an example?

Quote

And since we are dealing with files on disk: SSD drive!! Forget about HDD, way too slow, your users will have to wait several seconds each time they need to refresh the graphs.

Currently have an SSD drive with external drives to backup older data. I suppose if I wanted to keep to this plan, I'd have to be careful with a monolithic database like SQlite, but I don't imagine it's an impossible problem. In any case, I have enough local storage for my immediate needs. (I hope...)

Quote

And yes, cut the file size by half by using SGL instead of DBL! This only poses an issue for the timestamp since LabVIEW uses DBL. I worked around this by using 3 SGL columns to represent the timestamp.

So here you're describing storing data as a (2D?) array of SGL values in each write, as [[TimeA_1, TimeB_1, TimeC_1, Val_1, Val_2, ..., Val_N]; [TimeA_2, TimeB_2, TimeC_2, ...]; ...] where Time_i = (TimeA_i * A) + (TimeB_i * B) + TimeC_i or similar? What I'm meaning to ask is, you write an XY-array style, rather than using the waveform data type or an array of waveforms?

4 hours ago, GregSands said:

[With TDMS] the connection to Excel and Matlab is limited at best

I've seen similar comments on the Matlab file exchange's TDMS reader tools - this is potentially a problem for me but if I need to use TDMS I can try working through this as it happens, so long as I can get some preliminary tests working to make me believe it will. I already tried opening data from an SQlite database in Matlab and after some confusion over why I couldn't use their 'sqlite' function (answer: It's a new feature for 2016a and my machine has 2015b) installing a java plugin got me a database connection reasonably easily and then I could execute statements with 'fetch' well enough to feel reassured over plausibility.

Quote

Another option which is more standardized is HDF5 - I'm a fan of the h5labview library, but there are other alternatives as well. HDF is extremely flexible around the hierarchical relationship between the data and the storage format (so much so that it can get quite convoluted if you want it to), and is at least as fast as TDMS for reading/writing. Some advantages include the ability to read (or write) with arbitrary offsets/steps, even in multi-dimensional arrays, and the ability to add attributes at any level of the storage (file, object, data group, data, etc). Downsides are just getting your head around the options especially with data spaces (a mapping between your data array format and the data file format) - in some ways it's too flexible, although basic use is quite straightforward.

My only experience with HDF5 has been installing it, for use with other programs such as ParaView on Linux. I've generally found it to be a frustrating experience, no doubt because I choose all the wrong configure flags, but I suppose finding prebuild libraries and so on for Windows would be reasonably straightforward.

I'd guess the attributes are in the same style as TDMS properties? Data spaces sound useful from your description here but I'll have to take a look at HDF5's documentation in order to work out exactly what you mean. (Also, both SQlite and LabVIEW/Diadem have really detailed documentation so there's that to consider if HDF5's is less complete.)

Edited October 20, 2016 by Christian Butcher
Typo - not sure what 'CPI' intensive would be.

Christian Butcher · October 20, 2016

Doesn't look like I need to worry about the level of documentation for HDF5 either. Now just to read it.

GregSands · October 20, 2016

35 minutes ago, Christian Butcher said:

My only experience with HDF5 has been installing it, for use with other programs such as ParaView on Linux. I've generally found it to be a frustrating experience, no doubt because I choose all the wrong configure flags, but I suppose finding prebuild libraries and so on for Windows would be reasonably straightforward.

I'd guess the attributes are in the same style as TDMS properties? Data spaces sound useful from your description here but I'll have to take a look at HDF5's documentation in order to work out exactly what you mean. (Also, both SQlite and LabVIEW/Diadem have really detailed documentation so there's that to consider if HDF5's is less complete.)

Installing HDF5 on Windows is a breeze, no problems there. And there is plenty of documentation and examples online, but very little describing using it in LabVIEW. The question is whether its worth dealing with the learning curve which, in the end gets you somewhere much more powerful, but at the cost of time and complexity.

I should also have mentioned that h5labview is a wrapper around the HDF libraries, and only includes a subset of all that it can do - a pretty reasonable subset to be sure, and the developer seems quite open to ideas. I think the other LabVIEW option (Live HDF5) has similar restrictions in capability. One annoying thing is that both wrappers cannot be installed simultaneously due to filename conflicts, so it's hard to directly compare them.

And yes, HDF5 attributes are similar to TDMS properties, but more so - they can be arbitrary data themselves, not necessarily just strings. So it's easy to store, for example, a settings cluster as an attribute to its associated data.

ShaunR · October 20, 2016

7 hours ago, Christian Butcher said:

Is this difference due to transaction grouping

Yes.

7 hours ago, Christian Butcher said:

In the case of 400kHz, that's around 4 times higher that what I'd previously seen described in threads on this forum (around 100kHz) and what I calculated from your earlier description (0.5-1 MB/s -> ~65.5kHz-130kHZ assuming doubles inserted).

The actual performance is dependent on a number of factors, two of which are the version of SQLite used and how the binaries are compiled. The figure I quoted is a nominal value for SQLite in general rather than with the binaries I used since the intended message is that TDMS is orders of magnitude more performant for writing in the right circumstances. There is a benchmark included as an example with which you can ascertain the actual performance on your system with arbitrary columns and transaction numbers. You can download it from that page to try it as it is open source and free for non-commercial use.

7 hours ago, Christian Butcher said:

The comment on the bottom half of this image seems to exactly describe my worries. (Quoted below for forum readers unable/uninterested/unwilling to open the link and read the picture)

It is a single line SQL query so the complexity for decimation is trivial - unlike handling the data in LabVIEW with or without TDMS. I got bored after 1 million data points but if you do more (that VI is one of the examples), please post the results.

11 hours ago, Manudelavega said:

Yes but I'll need a bit of effort to put together something that can be understood easily, since right now it mostly consists of big Excel sheets with raw data. I'll try to post something soon-ish.

Sweet!

Edited October 20, 2016 by ShaunR

drjdpowell · October 20, 2016

16 hours ago, Manudelavega said:

And since we are dealing with files on disk: SSD drive!! Forget about HDD, way too slow, your users will have to wait several seconds each time they need to refresh the graphs.

Actually, at least up to the 2GB files I’ve tested, the SQLite file gets held in memory by Windows File Cache, so refreshing is fast (though the initial read will be slower with an HDD than an SSD, as will be writing).

ShaunR · October 20, 2016

22 minutes ago, drjdpowell said:

Actually, at least up to the 2GB files I’ve tested, the SQLite file gets held in memory by Windows File Cache, so refreshing is fast (though the initial read will be slower with an HDD than an SSD, as will be writing).

Only with Pragma Synchronous = OFF

Christian Butcher · October 20, 2016

A quick test using ShaunR's API and 'Speed Test' example VI with 100000 records, 4 columns and 10 iterations gave me (write) times on the order of 1000ms (per iteration, as approximately expected).

Creating a VI which called the 'Speed Test' twice in parallel (after modifying the speed test to be preallocated-reentrant and changing the path to accept a string control) gave two sets of ~1350ms, finishing at nearly the same time (when I first called it, before realising the test was non-reentrant, both VIs returned times of ~1000ms, but the calling VI took twice as long, with one set of indicators updating halfway through.)

Presumably if I check, I'll find some more VIs which are non-reentrant, which might account for the 30% increase in write time. However, the conclusion I'm drawing (perhaps wrongly) is that by writing to multiple (2,3, not 20,30) databases I can increase the total number of records I can INSERT in a given time.

This clearly isn't desirable when I want all the data to be in the same table, but perhaps I can write a master database with some less frequently written tables and then ATTACH single table databases containing data from the more high-throughput sensors? Here, tables can be distinguished by sensor source, which seems not too illogical to me. My hope (perhaps supported by my quick test) is that by opening parallel connections to different database files (i.e. distinct databases) I can write more quickly if needed. This would seem to be useful in the case that my data rates slightly exceed what I can INSERT into one database but not by so much that the system becomes entirely unworkable.

Is it then possible to select from a master database, with child databases ATTACHed, whilst separately writing to each database?

Apologies if I dropped off the deep end somewhere, and completely screwed up.

ShaunR · October 20, 2016

30 minutes ago, Christian Butcher said:

A quick test using ShaunR's API and 'Speed Test' example VI with 100000 records, 4 columns and 10 iterations gave me (write) times on the order of 1000ms (per iteration, as approximately expected).

Creating a VI which called the 'Speed Test' twice in parallel (after modifying the speed test to be preallocated-reentrant and changing the path to accept a string control) gave two sets of ~1350ms, finishing at nearly the same time (when I first called it, before realising the test was non-reentrant, both VIs returned times of ~1000ms, but the calling VI took twice as long, with one set of indicators updating halfway through.)

Presumably if I check, I'll find some more VIs which are non-reentrant, which might account for the 30% increase in write time. However, the conclusion I'm drawing (perhaps wrongly) is that by writing to multiple (2,3, not 20,30) databases I can increase the total number of records I can INSERT in a given time.

This clearly isn't desirable when I want all the data to be in the same table, but perhaps I can write a master database with some less frequently written tables and then ATTACH single table databases containing data from the more high-throughput sensors? Here, tables can be distinguished by sensor source, which seems not too illogical to me. My hope (perhaps supported by my quick test) is that by opening parallel connections to different database files (i.e. distinct databases) I can write more quickly if needed. This would seem to be useful in the case that my data rates slightly exceed what I can INSERT into one database but not by so much that the system becomes entirely unworkable.

Is it then possible to select from a master database, with child databases ATTACHed, whilst separately writing to each database?

Apologies if I dropped off the deep end somewhere, and completely screwed up.

SQlite is single writer, multiple readers as it does table level locking. If drjdpowell had followed through with the enumeration it would (or should) have been in there. The high level API in the SQLite API for LAbVIEW insulates you from the Error 5 hell and "busy" handling (when the DB is locked) that you encounter when trying simultaneous parallel writes with low level APIs. So simultaneous parallel writes is not appropriate usage.....sort of .

You can mitigate some of these operational aspects as you are attempting to do but in reality you are now just starting out the process of proving my rule of thumb:

On 10/19/2016 at 10:11 AM, ShaunR said:

If you want to search the data, use SQLite. If you want high speed bulk data written to disk then use TDMS. If you want high speed data that is searchable then use both with just-in-time post processing.

Edited October 20, 2016 by ShaunR

drjdpowell · October 20, 2016

1 hour ago, ShaunR said:

Only with Pragma Synchronous = OFF

No, seems to work fine without that, as long as one reading, not writing.

ShaunR · October 20, 2016

13 minutes ago, drjdpowell said:

No, seems to work fine without that, as long as one reading, not writing.

Reading, yes. Writing no. Incidentally. It's the same with INI files etc. You don't need to keep a list in memory or have lookup code for it. Just read and write directly to the the file. Not sure about other platforms, though.

Edited October 20, 2016 by ShaunR

drjdpowell · October 20, 2016

1 hour ago, ShaunR said:

If drjdpowell had followed through with the enumeration it would (or should) have been in there.

Sorry, what “enumeration”?

Sign In

Choice of logging format/style [TDMS, SQLite, some combination?]

Recommended Posts

Christian Butcher

smithd

Christian Butcher

hooovahh

smithd

Christian Butcher

ShaunR

drjdpowell

ShaunR

hooovahh

eberaud

ShaunR

GregSands

eberaud

Christian Butcher

Christian Butcher

GregSands

ShaunR

drjdpowell

ShaunR

Christian Butcher

ShaunR

drjdpowell

ShaunR

drjdpowell

Join the conversation

Similar Content

Cyth SQLite Logger

[CR] Cyth SQLite Logger

Looking for option to speed up my usage of SQLite

ANV Database Toolkit

[CR] SQLite Library 1 2 3 4 11

Browse

Activity

Important Information