Leaderboard
Popular Content
Showing content with the highest reputation on 02/06/2017 in all areas
- 
	So I took some time to do this right today. I applied the same basic optimizations to both mysql and postgres. Still need to find the right driver for cassandra, not going to happen today. For reference: Both dbs use id (int) t (time) and val (dbl). I have more complex data but most is dbl or bool. mysql installed as part of wampserver (easy 64-bit install). Used innodb as database type. Use several (5) connections using the TCP library (which I already know to be faster for small queries, especially those returning 0 rows). Set buffer pool size to 10G, bump log file size to 256M. Most critically, change innodb_flush_log_at_trx_commit to 2, so it flushes to disk slowly. This is the best configuration I've been able to find thus far, although I'm always welcome to hear of additional optimizations. postgres installed standalone. Use settings here: http://pgtune.leopard.in.ua/ for windows/data warehouse/32G. This matches basically with what I've found elsewhere. For postgres I ran both your driver and the ODBC driver through dbconn toolkit. Yours was about 10% faster. For the run I kept both on the same spinning disk (non-raid) and totally stopped each service while the other was running. Both applications were identical except the driver. I have one loop generating inserts of 500 elements and shoving that into a size-10 queue. I have a parallel for loop opening 5 connections to the db, beginning a transaction, writing 5 large inserts, committing the transaction, repeat. The 500 element insert is very simple, insert into dbldate (id, t, val) Values (.......). From what I understand, this should be the fastest way to insert data in both dbs short of using an intermediate file (which isn't my use case). Timestamp is fixed for each packet of 500, forcibly incremented by 100 ms for each iteration. id is set to the i terminal (id=0->499). For both I measured two values: how many elements have I shoved into the queue divided by total execution time (this will obviously be off by up to 10/N, but N is huge) and i charted the execution time of each transaction of 5x500 elements. Mysql 22 ms/packet after 8738 packets. It started low, <10 ms but this slowly rose up. Most critically for a streaming app, there were HUGE jitter spikes of up to 11.5 seconds and a significant amount of noise in the 2 second range. These spikes were contributing to the slow growth in execution time which, being an average over the whole run, lags vs the instantaneous numbers. This matches my experience in production, where the db keeps up on average, but the spikes lead to data loss as buffers overflow. Its worth saying here that I'd bet its possible to find and reduce the source of these spikes, but I'm at the end of my skills for optimization of the database. Postgres is a lot more cheerful. Time per packet was 10 ms after 6445 iterations (vs 11 ms for ODBC) and while there are a small number of large jitter spikes, they go up to...1-1.4 seconds. Which isn't anywhere near as bad. The 'noise' floor (which was 2 seconds for mysql) was about half a second for postgres. Whats more, I'm not at the end of my rope for optimization, as I read there are ways to disable some of the safety checks to make it go faster (this is the same I think as what I did for mysql with "flush log at transaction commit"). I know this isn't a perfect test, but keep in mind its also a reflection of the level of effort required for each. Mysql has always been the super easy to get up and running database, but for this particular use case postgres was just significantly easier to configure and configure well. On the query side I didn't do as much work. I just generated a simple query (id=250, t between blah and blah+5). In mysql this took 5 ms (using heidisql). Postgres took significantly longer -- 500 ms (using pgadmin). I then generated a more complex query (id between x and y, val > 0.5, time between x and x+5) and this took about 1 sec with postgres and 750 ms with mysql.1 point
- 
	Thanks. Here's a more developed subVI that I've used successfully inside a Pre-build action. I use it to set a Build CC symbol to be equal to the Build Name, so different builds can enable different code. Set CC Symbol.vi1 point
- 
	AllObjs[] uses the Z order, not the tabbing order. My understanding is that this is because it can include objects which can't be tabbed to, like decorations, so they're not part of the tabbing order. The same applies to SelectionList[] properties (although those also group items you selected together in a higher hierarchy).1 point
- 
	I haven't tried now and my exprience with trees is limited, but a quick look shows two possible options: The Top Left Visible Cell property, combined with the Number of Rows property, which will definitely work, but will require some computation to determine what it should be. The Open/Close.Ensure Visible method, which should probably work.1 point
- 
	Well that's okay I felt like doing some improvements on the image manipulation code. Attached is an improved version that supports ico and tif files and allows to select an image from within the file. For ico files it basically grabs the one image you select (with Image Index) and make an array of bytes that is a ico file with only that image in it, and then displays it in the picture box. For Tif files there is a .Net method for selecting the image which for some reason doesn't work on ico files. Edit: Updated to work with Tifs as well. Image Manipulation With Ico and Tif.zip1 point
- 
	I've been looking into this recently as well. The other TDMS alternative is HDF5 which has been discussed around here not long ago. For my situation I'm responsible for two conceptual customers -- people who want to maintain the machine, and people who want to see the immediate output of the machine. I'm thinking HDF5 is a good fit for the immediate output data set, as it is easily movable (flat file), can contain lots of data types (even images), and is cross platform enough that anyone can open the data from python or matlab or whatever. The other customer (long-term maintenance) requires more of a data warehousing scheme to store basically everything, potentially with compaction of older data (ie average over a day, a week, a month). This is feasible with flat files but it seems very unwieldy, so I've been also looking into time series databases. Here is essentially the evaluation I've made so far (which is inconclusive but may be helpful): Basic DBs, requiring significant client dev work. The advantage being that they are all old, heavily used, and backed by large organizations. mysql can provide this, but it doesn't seem to be great at it. What we've tried is that each row is (ID, value, timestamp) where (id, timestamp) is the unique primary key. What I've found is that complex queries basically take forever, so any analysis requires yanking out data in small chunks. Postgres seems to handle this structure (way, way, way) better based on a quick benchmark but I need to evaluate more. Cassandra seemed like it would be a better fit, but I had a lot of trouble inserting data quickly. With an identical structure to mysql/postgres, cassandra's out of box insert performance was the slowest of the three. Supposedly it should be able to go faster. The slow speed could also be due to the driver, which was a tcp package off the tools network of less then ideal quality. There is an alternative called Scylla which i believe aims to be many times faster with the same query language/interface, but I havent tried it. More complete solutions: Kairos DB seems cool, its a layer on top of cassandra, where they've presumably done the hard work of optimizing for speed. It has a lot of nice functions built-in including a basic web UI for looking at queries. I ran this in a VM since I don't have a linux machine but it was still quite fast. I need to do a proper benchmark vs the above options. InfluxDB seemed like a good platform (they claim to be very fast, although others claim they are full of crap), but their longetivity scares me. Their 1.0 release is recent, and it sounds like they rebuilt half their codebase from scratch for it. I read various things on the series of tubes which make me wonder how long the company will survive. Could very well just be doom and gloom though. Prometheus.io only supports floats, which is mostly OK, and allows for tagging values with strings. They use levelDB which is a google key-value pair storage format thats been around a few years. However its designed as a polling process which monitors the health of your servers and periodically fetching data from them. You can push data to it through a 'push gateway' but as far as the overall product goes, it doesn't seem designed for me. Graphite, from what I read, is limited to a fixed size database (like you want to keep the last 10M samples in a rolling buffer) and expects data to be timestamped in a fixed interval. This is partially described here. opentsdb: InfluxDB's benchmarks show it as slower then cassandra. It has to be deployed on top of hbase or hadoop and reading through the set-up process intimidated me, so I didn't investigate further, but I think the longetivity checkmark is hit with hadoop given how much of the world uses it. Heroic, same as above except it requires cassandra, elasticsearch, and kafka, so I never got around to trying to set it up. This may also help, i found it during my search: https://docs.google.com/spreadsheets/d/1sMQe9oOKhMhIVw9WmuCEWdPtAoccJ4a-IuZv4fXDHxM/edit Long story short, we don't need this half of the requirement right now, so we're tabling the decision until later when we have a better idea of a specific project's needs. For example, some of the tools have automatic compression and conversion over time, while others are geared more towards keeping all data forever. I don't know right now which is going to be the better fit.1 point
- 
	Here's the example code for setting the CCSymbols programmatically. CCExample.zip1 point

 
	 
	 
	