database What Database Toolkit do you use?

smithd · February 8, 2017

primary is (id, time) so I suppose it makes some sense if you don't give it any ID in the where clause it would have to do a table scan, but postgres is still taking 5 minutes even with an ID (or range of IDs). I dont know how postgres handles indices, but I think mysql just concatenates the values, so in this case it would be sorted by id then timestamp. That may also contribute to the slow insert performance as I'm inserting more as a function of time (so it may have to fiddle with the index quite a bit), but querying more as a function of ID.

Its worth mentioning I also tried to add an index to postgres which is a hash of just timestamp (no change in performance), but it occurs to me that if you're asking for a range of timestamps, it probably can't use that index.

Hw that its running on is a relatively dinky test server, 32 GB of ram and a single unraided spinning disk (relatively fast processor, but in this case the disk is the bottleneck). The reason I think this is an ok approximation is that this is for a long term data storage application, so tables will never fit entirely in ram and disk I/O will always be the bottleneck until purchasing several multi-tb SSDs becomes affordable.

Edited February 8, 2017 by smithd

drjdpowell · February 8, 2017

For comparison, I have a 2GB test SQLite database with 660k spectra, each of about 140 wavenumber (WN) readings, a total of 91M rows of Time, WN, Value. These are stored in a "Without ROWID" table with primary key (Time,WN) I select regions of interest (ROI) in WN and Time, and I average spectra over time intervals (every 100 seconds, say), then take some statistic like a maximum of an average over each interval. With WN ROI of about 30 points, and a time range of a few percent of the data, update time is about one second. Doing the full time range takes 30 seconds.

I don't understand why MySQL and Postgres are not able to get comparable results. But I would suggest trying (Time,id) as your primary key.

Notes:

-- the 2GB file is small enough to fit in the Windows File Cache, and so there is no disk access involved.

-- SQLite is only using a single CPU for this, so it doesn't matter how many processors my computer has

Sign In

database What Database Toolkit do you use?

Recommended Posts

smithd

drjdpowell

Join the conversation

Similar Content

ANV Database Toolkit

Re-establishing TestStand Database Connections: Does anyone know exactly how TestStand maintains its database connnection?

Using SQL Databases and LabVIEW exe

ADODB possible LabVIEW ActiveX Bug

Copy and Modify Database Toolkit

Browse

Activity

Important Information