Jump to content

What Database Toolkit do you use?


Recommended Posts

primary is (id, time) so I suppose it makes some sense if you don't give it any ID in the where clause it would have to do a table scan, but postgres is still taking 5 minutes even with an ID (or range of IDs). I dont know how postgres handles indices, but I think mysql just concatenates the values, so in this case it would be sorted by id then timestamp. That may also contribute to the slow insert performance as I'm inserting more as a function of time (so it may have to fiddle with the index quite a bit), but querying more as a function of ID.

Its worth mentioning I also tried to add an index to postgres which is a hash of just timestamp (no change in performance), but it occurs to me that if you're asking for a range of timestamps, it probably can't use that index.

Hw that its running on is a relatively dinky test server, 32 GB of ram and a single unraided spinning disk (relatively fast processor, but in this case the disk is the bottleneck). The reason I think this is an ok approximation is that this is for a long term data storage application, so tables will never fit entirely in ram and disk I/O will always be the bottleneck until purchasing several multi-tb SSDs becomes affordable. 

Edited by smithd
Link to comment

For comparison, I have a 2GB test SQLite database with 660k spectra, each of about 140 wavenumber (WN) readings, a total of 91M rows of Time, WN, Value.   These are stored in a "Without ROWID" table with primary key (Time,WN) I select regions of interest (ROI) in WN and Time, and I average spectra over time intervals (every 100 seconds, say), then take some statistic like a maximum of an average over each interval.  With WN ROI of about 30 points, and a time range of a few percent of the data, update time is about one second.  Doing the full time range takes 30 seconds.  

I don't understand why MySQL and Postgres are not able to get comparable results.  But I would suggest trying (Time,id) as your primary key.

Notes:

-- the 2GB file is small enough to fit in the Windows File Cache, and so there is no disk access involved.

-- SQLite is only using a single CPU for this, so it doesn't matter how many processors my computer has

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use.