database What Database Toolkit do you use?

drjdpowell · September 22, 2016

What toolkits do people use for accessing databases? I’ve been using SQLite a lot, but now need to talk to a proper database server (MySQL or Postgres). I’ve used the NI Database Connectivity Toolkit in the (far) past, but I know there are other options, such as using ADO.NET directly.

What do people use for database connectivity? What would you recommend?

— James

hooovahh · September 22, 2016

We have some 3rd party software that generates Access databases. For those I use NI's database connectivity toolkit. It was the first I tried, it was free (or included) and it worked.

But for new database my software creates, I generate them and read them using SQLite and your toolkit. We don't do much database work, and it has always been decentralized.

LogMAN · September 22, 2016

We use SQLite for all local configuration and MSSQL for public/shared data.

For SQLite we use your SQLite library (can't say it often again: Thank you very much for that great library!), for MSSQL we use the NI Database Connectivity Toolkit.

Big data like graphs (raw data) are generally stored in TDMS files, and in a few rare cases as binary files or even XML files.

Rolf Kalbermatter · September 22, 2016

Most of the database work we do is SQL Server, occasionally Oracle for specific customers and we normally use our own Database Toolkit. Difference to the NI database toolkit are Express VI based configuration wizards for the queries and transparent support for multiple database drivers such as MS SQL Server, Oracle and MYSQL, (MS Access too, but that hasn't been used in ages, so I wouldn't vouch for its spotless operation at this point). In addition I have my own ODBC based API that I have used in the past. I'm still considering about incorporating everything into a unified interface, likely based on a LabVIEW class interface, but proirity for that never seems to make it into the top 5.

smithd · September 22, 2016

mysql with the tcp connector (https://decibel.ni.com/content/docs/DOC-10453) so the cRIOs can talk to it for central data storage. For some queries (historical data) the db connectivity toolkit is faster, but mysql is slow as a historical database server anyway so I probably won't use it for that in the future -- it took a lot of tweaking and a lot of ram to get it to work at all.

I may end up using your sqlite library for configuration data on my next project but I haven't gotten around to checking that it supports all the OSs I need (definitely pharlap, maybe vxworks).

Tim_S · September 22, 2016

Traditionally I've used SQL Server for the database with built-in LabVIEW database functions. I've had too much "fun" with purchasing a copy of SQL Server for projects leaving the country (woo for the ever-changing international trade regulations :frusty: ). I've started exploring MaridaDB as an alternative, which also can be set up in ODBC so it's just slight differences in SQL statements that need to worry about.

ShaunR · September 22, 2016

I used LabSQL when the DB toolkit was a paid addon. It was a lot faster.

drjdpowell · September 22, 2016

Anybody used Postgresql?

Jordan Kuehn · September 22, 2016

12 minutes ago, drjdpowell said:

Anybody used Postgresql?

I "helped" with that postgresql question a couple weeks back and got a feel for it. It was a bit of a learning curve, but didn't seem too bad.

Most of our active systems use an Access DB and I have a library of tools using NI DCT that I use. I've been playing around with SQLite in my free time and I like what I've seen for local machine applications.

Edited September 22, 2016 by Jordan Kuehn

smithd · September 23, 2016

3 hours ago, drjdpowell said:

Anybody used Postgresql?

I was under the impression that the main advantage of postgres was if you were willing to write modules for it to do fast custom processing on the database side (ie move the code to the data). If you just want a standard sql database I got the impression postgres was only ok.

Neil Pate · September 23, 2016

I also used LabSQL when I did not have access to the NI toolkit, this was many years ago.

JamesMc86 · September 23, 2016

I've worked with postgres with the NI toolkit in the past without any issues.

I can't remember specifics but I remember finding it to be more feature rich that MySQL (partitioned tables was definitely one we were using)

drjdpowell · September 23, 2016

After studying documentation, my current plan is to use Postgresql with its libpq client dll. I think I can wrap libpq in a similar way to what I did with SQLite.

drjdpowell · October 14, 2016

Anybody using Postgres who would like to beta test my libpq-based library? It’s similar to my SQLite access library.

PQ LabVIEW.png

maxout · January 13, 2017

Hi Team,

I am looking for a toolkit/driver which would allow me to connect to a Server based MySQL and be able to interface with the database. Do you know any free toolkits that would allow me to do that?

I have installed XAMPP in my Win10 machine and setup a MySQL server. Next, installed the Windows ODBC connector and tested and passed the connection. Now i would like to connect LabVIEW application to the database that is hosted.

It would be great if someone can help me figure out how can i connect and perform various read/write/modify tasks.

Thanks

Max.

joerghampel · January 13, 2017

I've been using ADO.NET in the last years for Windows applications, and the NI DCT before that. I did one project on RT with the MySQL TCP connector, but as mentioned by @smithd it was difficult getting it to perform well enough.

Recently, I've been working with SQLite (with your toolkit as well as directly using the DLL). Next will be to try and get SQLite running on real-time, on linux first, then Phar Lap and perhaps VxWorks - again like @smithd ;-)

smithd · January 24, 2017

On 10/14/2016 at 3:29 AM, drjdpowell said:

Anybody using Postgres who would like to beta test my libpq-based library? It’s similar to my SQLite access library.

I did some testing with postgres and holy crap is it fast*. I'd be interested in trying out your library if the offer still stands.

*for my use case, vs mysql, without any optimization

drjdpowell · January 24, 2017

Here you go. Afraid I've not been able to work on it recently, as other priorities keep intruding.

jdp_science_postgresql-0.1.1.8.vip

smithd · January 24, 2017

4 hours ago, drjdpowell said:

Here you go. Afraid I've not been able to work on it recently, as other priorities keep intruding.

jdp_science_postgresql-0.1.1.8.vip

Thanks, I'll give it a go

drjdpowell · January 25, 2017

Note, BTW, that I haven’t worked on INSERT speed yet (as my application doesn’t require it) and the example only inserts one row at a time. Better speed comes from multi-row INSERTs, and even faster is likely the COPY command, which I intend to support with the toolkit at some point.

Also, from my reading, I wouldn’t expect MySQL to be slower than Postgres (except possibly for complex queries), so I think there must be something wrong with your MySQL benchmark.

smithd · January 25, 2017

5 hours ago, drjdpowell said:

Also, from my reading, I wouldn’t expect MySQL to be slower than Postgres (except possibly for complex queries), so I think there must be something wrong with your MySQL benchmark.

I kind of read the opposite, but fair enough. The specific use case is for large time series data so I inserted data ~1000 rows at a time for 100s of GB of data. I used the same schema for both but otherwise did not touch any setting or enable partitioning on either database (to make this work at all with the mysql database on our 'production' system I spent several days reading through various methods for improving performance and implementing them -- limiting flush to disk, using >20 separate connections to insert data, partitioning, etc). I wanted to see the baseline, and mysql's out of box speed was embarrassingly slow once the data size (or the index) got larger than available ram. Postgres got slower but kept on going. I'm assuming they have better default management of the index, and it looks like they also have an interesting partitioning scheme.

drjdpowell · January 25, 2017

I'll be interested in learning your results, as I'm considering Postgres for a data-recording application.

smithd · February 6, 2017

So I took some time to do this right today. I applied the same basic optimizations to both mysql and postgres. Still need to find the right driver for cassandra, not going to happen today. For reference:

Both dbs use id (int) t (time) and val (dbl). I have more complex data but most is dbl or bool.

mysql installed as part of wampserver (easy 64-bit install). Used innodb as database type. Use several (5) connections using the TCP library (which I already know to be faster for small queries, especially those returning 0 rows). Set buffer pool size to 10G, bump log file size to 256M. Most critically, change innodb_flush_log_at_trx_commit to 2, so it flushes to disk slowly. This is the best configuration I've been able to find thus far, although I'm always welcome to hear of additional optimizations.

postgres installed standalone. Use settings here: http://pgtune.leopard.in.ua/ for windows/data warehouse/32G. This matches basically with what I've found elsewhere. For postgres I ran both your driver and the ODBC driver through dbconn toolkit. Yours was about 10% faster.

For the run I kept both on the same spinning disk (non-raid) and totally stopped each service while the other was running. Both applications were identical except the driver. I have one loop generating inserts of 500 elements and shoving that into a size-10 queue. I have a parallel for loop opening 5 connections to the db, beginning a transaction, writing 5 large inserts, committing the transaction, repeat. The 500 element insert is very simple, insert into dbldate (id, t, val) Values (.......). From what I understand, this should be the fastest way to insert data in both dbs short of using an intermediate file (which isn't my use case). Timestamp is fixed for each packet of 500, forcibly incremented by 100 ms for each iteration. id is set to the i terminal (id=0->499).

For both I measured two values: how many elements have I shoved into the queue divided by total execution time (this will obviously be off by up to 10/N, but N is huge) and i charted the execution time of each transaction of 5x500 elements.

Mysql 22 ms/packet after 8738 packets. It started low, <10 ms but this slowly rose up. Most critically for a streaming app, there were HUGE jitter spikes of up to 11.5 seconds and a significant amount of noise in the 2 second range. These spikes were contributing to the slow growth in execution time which, being an average over the whole run, lags vs the instantaneous numbers. This matches my experience in production, where the db keeps up on average, but the spikes lead to data loss as buffers overflow. Its worth saying here that I'd bet its possible to find and reduce the source of these spikes, but I'm at the end of my skills for optimization of the database.

Postgres is a lot more cheerful. Time per packet was 10 ms after 6445 iterations (vs 11 ms for ODBC) and while there are a small number of large jitter spikes, they go up to...1-1.4 seconds. Which isn't anywhere near as bad. The 'noise' floor (which was 2 seconds for mysql) was about half a second for postgres. Whats more, I'm not at the end of my rope for optimization, as I read there are ways to disable some of the safety checks to make it go faster (this is the same I think as what I did for mysql with "flush log at transaction commit").

I know this isn't a perfect test, but keep in mind its also a reflection of the level of effort required for each. Mysql has always been the super easy to get up and running database, but for this particular use case postgres was just significantly easier to configure and configure well.

On the query side I didn't do as much work. I just generated a simple query (id=250, t between blah and blah+5). In mysql this took 5 ms (using heidisql). Postgres took significantly longer -- 500 ms (using pgadmin). I then generated a more complex query (id between x and y, val > 0.5, time between x and x+5) and this took about 1 sec with postgres and 750 ms with mysql.

smithd · February 8, 2017

small update: I did a more complex query and the results were less happy. Both tests ran to something on the order of 70 million rows in the table, and a given 10 minute period should have about 6000 entries (100 ms apart) for about 3 million queried rows. In both databases a query just for a specific ID and specific time range is pretty quick. However this breaks down terribly if you select a time range + an id range, which is what you might do if you wanted to, say, see all of average values of all the values associated with system x. So what I ran was:

select avg, stddev, max, min (val) where t between x and x+10 and id between 250 and 260 or something along those lines.

Mysql performed this query very slowly, about a minute for a warmed up database server. In fact, it even did fine on the whole range of IDs (no where clause for ID) and didn't have much of a slowdown at all. I don't think I saw a query take more than 80 seconds. Note that I still think this is ridiculous for 3 million rows (lets guess its something like 100 MB of data it has to process in the worst case I could imagine) but its fantastic in comparison to postgres.

With postgres, I tried the query with no ID first -- just timestamp, and gave up after maybe 20 or so minutes. I did some searches and found that using the query "vacuum analyze" might help so I ran that. Still no go. I added 'where id between 250 and 260' back in and it took about 6 minutes to run. I even used the explain tool to understand the issue. It looks like it is using the index, but if you leave out the ID filter it doesn't do so--it does a full table scan. So the concerning thing is that even in the case where it uses the index, it performs several times worse than mysql (which again is already bad). Its literally faster to select for each ID individually than to do a bulk query in postgres.

I'm not sure where to go from here, but again I thought I'd share.

drjdpowell · February 8, 2017

Remind me. What Primary Key do you use? Is it (id,time) or (time,id)?

Oh, and what hardware are you running on?

Edited February 8, 2017 by drjdpowell

Sign In

database What Database Toolkit do you use?

Recommended Posts

drjdpowell

hooovahh

LogMAN

Rolf Kalbermatter

smithd

Tim_S

ShaunR

drjdpowell

Jordan Kuehn

smithd

Neil Pate

JamesMc86

drjdpowell

drjdpowell

maxout

joerghampel

smithd

drjdpowell

smithd

drjdpowell

smithd

drjdpowell

smithd

smithd

drjdpowell

Join the conversation

Similar Content

ANV Database Toolkit

Re-establishing TestStand Database Connections: Does anyone know exactly how TestStand maintains its database connnection?

Using SQL Databases and LabVIEW exe

ADODB possible LabVIEW ActiveX Bug

Copy and Modify Database Toolkit

Browse

Activity

Important Information