Jump to content

What Database Toolkit do you use?


Recommended Posts

What toolkits do people use for accessing databases?   I’ve been using SQLite a lot, but now need to talk to a proper database server (MySQL or Postgres).  I’ve used the NI Database Connectivity Toolkit in the (far) past, but I know there are other options, such as using ADO.NET directly.

What do people use for database connectivity?  What would you recommend?

— James

Link to post
Share on other sites

We have some 3rd party software that generates Access databases.  For those I use NI's database connectivity toolkit.  It was the first I tried, it was free (or included) and it worked.  

But for new database my software creates, I generate them and read them using SQLite and your toolkit.  We don't do much database work, and it has always been decentralized.

Link to post
Share on other sites

We use SQLite for all local configuration and MSSQL for public/shared data.

For SQLite we use your SQLite library (can't say it often again: Thank you very much for that great library!), for MSSQL we use the NI Database Connectivity Toolkit.

Big data like graphs (raw data) are generally stored in TDMS files, and in a few rare cases as binary files or even XML files.

Link to post
Share on other sites

Most of the database work we do is SQL Server, occasionally Oracle for specific customers and we normally use our own Database Toolkit. Difference to the NI database toolkit are Express VI based configuration wizards for the queries and transparent support for multiple database drivers such as MS SQL Server, Oracle and MYSQL, (MS Access too, but that hasn't been used in ages, so I wouldn't vouch for its spotless operation at this point). In addition I have my own ODBC based API that I have used in the past. I'm still considering about incorporating everything into a unified interface, likely based on a LabVIEW class interface, but proirity for that never seems to make it into the top 5.

Link to post
Share on other sites

mysql with the tcp connector (https://decibel.ni.com/content/docs/DOC-10453) so the cRIOs can talk to it for central data storage. For some queries (historical data) the db connectivity toolkit is faster, but mysql is slow as a historical database server anyway so I probably won't use it for that in the future -- it took a lot of tweaking and a lot of ram to get it to work at all.

I may end up using your sqlite library for configuration data on my next project but I haven't gotten around to checking that it supports all the OSs I need (definitely pharlap, maybe vxworks).

Link to post
Share on other sites

Traditionally I've used SQL Server for the database with built-in LabVIEW database functions. I've had too much "fun" with purchasing a copy of SQL Server for projects leaving the country (woo for the ever-changing international trade regulations :frusty:). I've started exploring MaridaDB as an alternative, which also can be set up in ODBC so it's just slight differences in SQL statements that need to worry about.

Link to post
Share on other sites
12 minutes ago, drjdpowell said:

Anybody used Postgresql?  

I "helped" with that postgresql question a couple weeks back and got a feel for it.  It was a bit of a learning curve, but didn't seem too bad.

 

Most of our active systems use an Access DB and I have a library of tools using NI DCT that I use.  I've been playing around with SQLite in my free time and I like what I've seen for local machine applications.  

Edited by Jordan Kuehn
Link to post
Share on other sites
3 hours ago, drjdpowell said:

Anybody used Postgresql?  

I was under the impression that the main advantage of postgres was if you were willing to write modules for it to do fast custom processing on the database side (ie move the code to the data). If you just want a standard sql database I got the impression postgres was only ok.

Link to post
Share on other sites
  • 3 weeks later...
  • 2 months later...

Hi Team, 

I am looking for a toolkit/driver which would allow me to connect to a Server based MySQL and be able to interface with the database. Do you know any free toolkits that would allow me to do that? 

I have installed XAMPP in my Win10 machine and setup a MySQL server. Next, installed the Windows ODBC connector and tested and passed the connection. Now i would like to connect LabVIEW application to the database that is hosted. 

It would be great if someone can help me figure out how can i connect and perform various read/write/modify tasks. 

Thanks

Max. 

Link to post
Share on other sites

I've been using ADO.NET in the last years for Windows applications, and the NI DCT before that. I did one project on RT with the MySQL TCP connector, but as mentioned by @smithd it was difficult getting it to perform well enough.

Recently, I've been working with SQLite (with your toolkit as well as directly using the DLL). Next will be to try and get SQLite running on real-time, on linux first, then Phar Lap and perhaps VxWorks - again like @smithd ;-)

Link to post
Share on other sites
  • 2 weeks later...
On 10/14/2016 at 3:29 AM, drjdpowell said:

Anybody using Postgres who would like to beta test my libpq-based library?  It’s similar to my SQLite access library.

I did some testing with postgres and holy crap is it fast*. I'd be interested in trying out your library if the offer still stands.

 

*for my use case, vs mysql, without any optimization

Link to post
Share on other sites

Note, BTW, that I haven’t worked on INSERT speed yet (as my application doesn’t require it) and the example only inserts one row at a time.  Better speed comes from multi-row INSERTs, and even faster is likely the COPY command, which I intend to support with the toolkit at some point.

Also, from my reading, I wouldn’t expect MySQL to be slower than Postgres (except possibly for complex queries), so I think there must be something wrong with your MySQL benchmark.

Link to post
Share on other sites
5 hours ago, drjdpowell said:

Also, from my reading, I wouldn’t expect MySQL to be slower than Postgres (except possibly for complex queries), so I think there must be something wrong with your MySQL benchmark.

I kind of read the opposite, but fair enough. The specific use case is for large time series data so I inserted data ~1000 rows at a time for 100s of GB of data. I used the same schema for both but otherwise did not touch any setting or enable partitioning on either database (to make this work at all with the mysql database on our 'production' system I spent several days reading through various methods for improving performance and implementing them -- limiting flush to disk, using >20 separate connections to insert data, partitioning, etc). I wanted to see the baseline, and mysql's out of box speed was embarrassingly slow once the data size (or the index) got larger than available ram. Postgres got slower but kept on going. I'm assuming they have better default management of the index, and it looks like they also have an interesting partitioning scheme.

Link to post
Share on other sites
  • 2 weeks later...

So I took some time to do this right today. I applied the same basic optimizations to both mysql and postgres. Still need to find the right driver for cassandra, not going to happen today. For reference:

Both dbs use id (int) t (time) and val (dbl). I have more complex data but most is dbl or bool.

mysql installed as part of wampserver (easy 64-bit install). Used innodb as database type. Use several (5) connections using the TCP library (which I already know to be faster for small queries, especially those returning 0 rows). Set buffer pool size to 10G, bump log file size to 256M. Most critically, change innodb_flush_log_at_trx_commit to 2, so it flushes to disk slowly. This is the best configuration I've been able to find thus far, although I'm always welcome to hear of additional optimizations.

postgres installed standalone. Use settings here: http://pgtune.leopard.in.ua/ for windows/data warehouse/32G. This matches basically with what I've found elsewhere. For postgres I ran both your driver and the ODBC driver through dbconn toolkit. Yours was about 10% faster.

For the run I kept both on the same spinning disk (non-raid) and totally stopped each service while the other was running. Both applications were identical except the driver. I have one loop generating inserts of 500 elements and shoving that into a size-10 queue. I have a parallel for loop opening 5 connections to the db, beginning a transaction, writing 5 large inserts, committing the transaction, repeat. The 500 element insert is very simple, insert into dbldate (id, t, val) Values (.......). From what I understand, this should be the fastest way to insert data in both dbs short of using an intermediate file (which isn't my use case). Timestamp is fixed for each packet of 500, forcibly incremented by 100 ms for each iteration. id is set to the i terminal (id=0->499).

For both I measured two values: how many elements have I shoved into the queue divided by total execution time (this will obviously be off by up to 10/N, but N is huge) and i charted the execution time of each transaction of 5x500 elements. 

Mysql 22 ms/packet after 8738 packets. It started low, <10 ms but this slowly rose up. Most critically for a streaming app, there were HUGE jitter spikes of up to 11.5 seconds and a significant amount of noise in the 2 second range. These spikes were contributing to the slow growth in execution time which, being an average over the whole run, lags vs the instantaneous numbers. This matches my experience in production, where the db keeps up on average, but the spikes lead to data loss as buffers overflow. Its worth saying here that I'd bet its possible to find and reduce the source of these spikes, but I'm at the end of my skills for optimization of the database.

Postgres is a lot more cheerful. Time per packet was 10 ms after 6445 iterations (vs 11 ms for ODBC) and while there are a small number of large jitter spikes, they go up to...1-1.4 seconds. Which isn't anywhere near as bad. The 'noise' floor (which was 2 seconds for mysql) was about half a second for postgres. Whats more, I'm not at the end of my rope for optimization, as I read there are ways to disable some of the safety checks to make it go faster (this is the same I think as what I did for mysql with "flush log at transaction commit"). 

I know this isn't a perfect test, but keep in mind its also a reflection of the level of effort required for each. Mysql has always been the super easy to get up and running database, but for this particular use case postgres was just significantly easier to configure and configure well.

On the query side I didn't do as much work. I just generated a simple query (id=250, t between blah and blah+5). In mysql this took 5 ms (using heidisql). Postgres took significantly longer -- 500 ms (using pgadmin). I then generated a more complex query (id between x and y, val > 0.5, time between x and x+5) and this took about 1 sec with postgres and 750 ms with mysql.

  • Like 1
Link to post
Share on other sites

small update: I did a more complex query and the results were less happy. Both tests ran to something on the order of 70 million rows in the table, and a given 10 minute period should have about 6000 entries (100 ms apart) for about 3 million queried rows. In both databases a query just for a specific ID and specific time range is pretty quick. However this breaks down terribly if you select a time range + an id range, which is what you might do if you wanted to, say, see all of average values of all the values associated with system x. So what I ran was:

select avg, stddev, max, min (val) where t between x and x+10 and id between 250 and 260 or something along those lines.

Mysql performed this query very slowly, about a minute for a warmed up database server. In fact, it even did fine on the whole range of IDs (no where clause for ID) and didn't have much of a slowdown at all. I don't think I saw a query take more than 80 seconds. Note that I still think this is ridiculous for 3 million rows (lets guess its something like 100 MB of data it has to process in the worst case I could imagine) but its fantastic in comparison to postgres.

With postgres, I tried the query with no ID first -- just timestamp, and gave up after maybe 20 or so minutes. I did some searches and found that using the query "vacuum analyze" might help so I ran that. Still no go. I added 'where id between 250 and 260' back in and it took about 6 minutes to run. I even used the explain tool to understand the issue. It looks like it is using the index, but if you leave out the ID filter it doesn't do so--it does a full table scan. So the concerning thing is that even in the case where it uses the index, it performs several times worse than mysql (which again is already bad). Its literally faster to select for each ID individually than to do a bulk query in postgres.

 I'm not sure where to go from here, but again I thought I'd share.

Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

  • Similar Content

    • By kosist90
      Dear Community,
      let me present our new ANV Database Toolkit, which has been recently released at vipm.io.
      Short introduction to the toolkit is posted by this link, and it also describes steps which should be done in order to use this toolkit.
      ANV Database Toolkit helps developers design LabVIEW API for querying various databases (MS SQL, MySQL, SQLite, Access). It allows to create VIs which can be used as API with the help of graphical user interface. When using these VIs, toolkit handles connection with the database, thus relieving developers of this burden in their applications.
      It has the following features:
      Simplifies handling of databases in LabVIEW projects Allows to graphically create API VIs for Databases Supports Read, Write, Update and Delete queries Supports various database types (MS SQL, MySQL, SQLite, Access) Overall idea is that developer could create set of high-level API VIs for queries using graphical user interface, without actual writing of SQL queries. Those API VIs are used in the application, and handle database communication in the background. Moreover, SQL query could be applied to any of the supported database types, it is a matter of database type selection. Change of target database does not require changes in API VI which executes the query.
      After installation of the toolkit, sample project is available, which shows possibilities of the toolkit in terms of execution different types of queries.
      Note, that in order to install the toolkit, VI Package Manager must be launched with Administrator privileges.
      This toolkit is paid, and price is disclosed based on price quotation. But anyway, there are 30 days of trial period during which you could tryout the toolkit, and decide whether it is helpful (and hope that it will be) for your needs.
      In case of any feedback, ideas or issues please do not hesitate to contact me directly here, or at vipm.io, or at e-mail info@anv-tech.com.
       
        
    • By Bryan
      TestStand Version(s) Used: 2010 thru 2016
      Windows (7 & 10)
      Database: MS SQL Server (v?)
      Note: The database connection I'm referring to is what's configured in "Configure > Result Processing", (or equivalent location in older versions).
      Based on some issues we've been having with nearly all of our TestStand-based production testers, I'm assuming that TestStand opens the configured database connection when the sequence is run, and maintains that same connection for all subsequent UUTs tested until the sequence is stopped/terminated/aborted.  However, I'm not sure of this and would like someone to either confirm or correct this assumption. 
      The problem we're having is that: Nearly all of our TestStand-based production testers have intermittently been losing their database connections - returning an error (usually after the PASS/FAIL banner).  I'm not sure if it's a TestStand issue or an issue with the database itself. The operator - Seeing and only caring about whether it passed or failed, often ignores the error message and soldiers on, mostly ignoring every error message that pops up.  Testers at the next higher assembly that look for a passed record of the sub assemblies' serial number in the database will now fail their test because they can't find a passed record of the serial number. 
      We've tried communicating with the operators to either let us know when the error occurs, re-test the UUT, or restart TestStand (usually the option that works), but it's often forgotten or ignored.
      The operators do not stop the test sequence when they go home for the evening/weekend/etc. so, TestStand is normally left waiting for them to enter the next serial number of the device to test.  I'm assuming that their connection to the database is still opened during this time.  If so, it's almost as though MS SQL has been configured to terminate idle connections to it, or if something happens with the SQL Server - the connection hasn't been properly closed or re-established, etc. 
      Our LabVIEW based testers don't appear to have this problem unless there really is an issue with the database server.  The majority of these testers I believe open > write > close their database connections at the completion of a unit test. 
      I'm currently looking into writing my own routine to be called in the "Log to Database" callback which will open > write > close the database connection.  But, I wanted to check if anyone more knowledgeable had any insight before I spend time doing something that may have an easy fix.
      Thanks all!
    • By ATE-ENGE
      Background:
      I've been using LabVIEW for a few years for automation testing tasks and until recently have been saving my data to "[DescriptorA]\[DescriptorB]\[test_info].csv" files. A few months ago, a friend turned me on to the concept of relational databases, I've been really impressed by their response times and am reworking my code and following the examples with the Database Connectivity Toolkit (DCT) to use "[test_info].mdb" with my provider being a Microsoft jet oldb database.
      However, I'm beginning to see the limitations of the DCT namely:
      No support for auto-incrementing primary keys No support for foreign keys Difficult to program stored procedures and I'm sure a few more that I don't know yet.
      Now I've switched over to architecting my database in MySQL Workbench. Suffice to say I'm a bit out of my depth and have a few questions that I haven't seen covered in tutorials
       Questions (General):
       Using Microsoft jet oldb I made a connection string "Data Source= C:\[Database]\[databasename.mdb]" in a .UDL file. However, the examples I've seen for connecting to MySQL databases use IP addresses and ports.
      Is a MySQL database still a file? If not, how do I put it on my networked server \\[servername\Database\[file]? If so, what file extensions exist for databases and what is the implication of each extension? I know of .mdb, but are there others I could/should be using (such as .csv's vs .txt's)  My peers, who have more work experience than me but no experience with databases, espouse a 2GB limit on all files (I believe from the era of FAT16 disks). My current oldb database is about 200mB in size so 2GB will likely never happen, but I'm curious:
      Do file size limits still apply to database files? If so, how does one have the giant databases that support major websites?  Questions (LabVIEW Specific):
      I can install my [MainTestingVi.exe], which accesses the jet oldb database, on a Windows 10 computer that is fresh out of the box. When I switch over to having a MySQL database, are there any additional tools that I'll need to install as well? 
    • By GregFreeman
      I think I have found a fundamental issue with the DB Toolkit Open connection. It seems to not correctly use connection pooling. The reason I believe it's an issue with LabVIEW and ADODB ActiveX specifically is because the problem does not manifest itself using the ADODB driver in C#. This is better shown with examples. All I am doing in these examples is opening and closing connections and benchmarking the connection open time.
      Adodb and Oracle driver in LabVIEW.

       
      ADODB in C#
       
      namespace TestAdodbOpenTime { class Program { static void Main(string[] args) { Stopwatch sw = new Stopwatch(); for (int i = 0; i < 30; i++) { ADODB.Connection cn = new ADODB.Connection(); int count = Environment.TickCount; cn.Open("Provider=OraOLEDB.Oracle;Data Source=FASTBAW;Extended Properties=PLSQLRSet=1;Pooling=true;", "USERID", "PASSWORD", -1); sw.Stop(); cn.Close(); int elapsedTime = Environment.TickCount - count; Debug.WriteLine("RunTime " + elapsedTime); } } } } Output:
      RunTime 203
      RunTime 0
      RunTime 0
      RunTime 0
      RunTime 0
      RunTime 0
      RunTime 0
      RunTime 0
      RunTime 0
       
      Notice the time nicely aligns between the LabVIEW code leveraging the .NET driver and the C# code using ADODB. The first connection takes a bit to open then the rest the connection pooling takes over nicely and the connect time is 0. 
       
      Now cue the LabVIEW ActiveX implementation and every open connection time is pretty crummy and very sporadic. 
       
      One thing I happened to find out by accident when troubleshooting was if I add a property node on the block diagram where I open a connection, and if I don't close the reference, my subsequent connect times are WAY faster (between 1 and 3 ms). That is what leads me to believe this may be a bug in whatever LabVIEW does to interface with ActiveX.
       
      Has anyone seen issues like this before or have any idea of where I can look to help me avoid wrapping up the driver myself?
       
    • By GregFreeman
      I am running calls to a various stored procedures in parallel, each with their own connection refnums. A few of these calls can take a while to execute from time to time. In critical parts of my application I would like the Cmd Execute.vi to be reentrant. Generally I handle this by making a copy of the NI library and namespacing my own version. I can then make a reentrant copy of the VI I need and save it in my own library, then commit it in version control so everyone working on the project has it. But the library is password protected so even a copy of it keeps it locked. I can't do a save as on the VIs that I need and make a reentrant copy, nor can I add any new VIs to the library.
      Does anyone have any suggestions? I have resorted to taking NIs library, including it inside my own library, then basically rewriting the VIs I need by copying the contents from the block diagram of the VI I want to "save as" and pasting them in another VI.
×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use.