lavezza Posted September 25, 2009 Report Posted September 25, 2009 I'm curious to see how other people have dealt with this problem. (or would deal with it) Example: We have a system with 1000's of sensors, things like temps, pressures, switch status, etc. Some of these may be changing at 5Hz, some may only change a couple times a day. A third-party data system collects the readings at different rates. In addition, some of the sensors are smart and only send data when things change. So, we get packets over the network and we need to save them. Each packet will have a different number of sensors reporting and a timestamp. Obviously the packets contain both the data and the sensor names or else we won't know what we were looking at. What would be the best way to save this data that would allow quick access (users want to be able to call up a plot of old data quickly)? Conceptually, you can look at this as a big spreadsheet table with a timestamp column and a column for every sensor. Some of the sensor columns would be very sparsely populated. We looked into storing data in TDMS files, but while we can quickly pull out the data, it looses it's timestamp. In other words, if we saved 1000 packets, we would have 1000 timestamps and maybe 120 samples of sensor1. If I pull the 120 samples of sensor1 out, I don't see a way of getting the 120 timestamps that sync up with those readings. We could use a database, but I'm concerned about the speed. We'll have to do some testing. Has anyone faced this kind of problem? How have you handled it? Quote
MikaelH Posted September 25, 2009 Report Posted September 25, 2009 Hi I suggest a database, a fast one The table has to be optimized for high speed inserts and queries/searches. When I get data from a table to show a plot and it contains huge number of values, I let the database do some decimation of the data, i.e. Take the first 1000 values and convert it to 1 value, the next 1000 values will be value 2. For this decimate function I have 3 options (Max, Avg, Min) the database uses when converting 1000 values to 1. Also every night you can let the database create records in an aggregation table that contains some pre-calibrated values and is optimized for queries only. Use this table when the user doesn’t need live data. Another thing, I always save the raw data as well in the database in a Special table and then let the database do the Math to “Calibrate” the raw data to the real value that is stored in a separate table. This approach is good if you want traceability. E.g. After a sensor gets recalibrated, and you noticed that it has been using the wrong calibration constants. In that case you can just recalculate the old data so it becomes correct. Cheers, Mikael Quote
Rolf Kalbermatter Posted September 25, 2009 Report Posted September 25, 2009 I'm curious to see how other people have dealt with this problem. (or would deal with it) Example: We have a system with 1000's of sensors, things like temps, pressures, switch status, etc. Some of these may be changing at 5Hz, some may only change a couple times a day. A third-party data system collects the readings at different rates. In addition, some of the sensors are smart and only send data when things change. So, we get packets over the network and we need to save them. Each packet will have a different number of sensors reporting and a timestamp. Obviously the packets contain both the data and the sensor names or else we won't know what we were looking at. What would be the best way to save this data that would allow quick access (users want to be able to call up a plot of old data quickly)? Conceptually, you can look at this as a big spreadsheet table with a timestamp column and a column for every sensor. Some of the sensor columns would be very sparsely populated. We looked into storing data in TDMS files, but while we can quickly pull out the data, it looses it's timestamp. In other words, if we saved 1000 packets, we would have 1000 timestamps and maybe 120 samples of sensor1. If I pull the 120 samples of sensor1 out, I don't see a way of getting the 120 timestamps that sync up with those readings. We could use a database, but I'm concerned about the speed. We'll have to do some testing. Has anyone faced this kind of problem? How have you handled it? I don't think a classical database design with one single table for all sensors would be useful. Since you have so much variation in sample speed this would be an immense waste of space and performance. A historical database would be much more suited where every channel (sensor) is its own table with a timestamp, value and possibly status column. That allows to easily add and remove channels to the setup. Adding a new channel is simply creating a new table for it, removing a channel is simply not using it anymore. LabVIEW has the DSC add-on Toolkit which contains its own Citadel database (based on the Logos engine that comes from their Lookout package). This is a highly optimized historical database and there also exists an ODBC driver interface to it so you can query the database also from non LabVIEW applications. You could also devise your own historical database design but I would bet it is hard to beat the Citadel database in such a way. Rolf Kalbermatter Quote
rkanders Posted September 26, 2009 Report Posted September 26, 2009 I don't think a classical database design with one single table for all sensors would be useful. Rolf Kalbermatter Why wouldn't a single table with say SensorID, TimeStamp, Data fields work? That approach actually scales a bit better because database does not have to be modified to add another sensor. On my PC that approach can insert about 900 records per second. Quote
Rolf Kalbermatter Posted September 26, 2009 Report Posted September 26, 2009 Why wouldn't a single table with say SensorID, TimeStamp, Data fields work? That approach actually scales a bit better because database does not have to be modified to add another sensor. On my PC that approach can insert about 900 records per second. A good historical database can easily top those 900 data samples per second and the querying of data is also quite a bit easier since the indexing over the sensor ID is not necessary. With a historical database where each channel is in fact its own table with a timestamp, value and status column (and no need to have relational humbug and such, so this can be highly optimized too) the organization is much more straightforward and the logging of data is very fast. The only difficulty here is the querying of combined data, as you need to do a similar thing to a join statement when wanting to query multiple channels at the same time, but that is a task that can be overseen fairly well since the structure of the tables is very simple. With your approach querying does only get more complicated as far as the database engine is concerned although for you as user you will not see much of a difference. Adding a new channel in a historical database is simply adding a new table and that has no influence whatsoever on the other already logged data. For the end user it won't make a big difference other than that a historical database can be much more optimized for the task at hand than solving this with a generic super duper relational, object oriented and whatever else database engine. And yes the select statement will look a little different . Rolf Kalbermatter Quote
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.