Don't use KairosDB if you care for performance, use Cassandra directly.
I've been working with it and it's nothing but problems. My advice is to use Cassandra directly and be done with it
The first problem with KairosDB is that the documentation is awful. To be fair, Cassandra has some issues there as well though if you do work out how to do things in Cassandra, what it's good at and apply it to appropriate problems then I can't say I really have any reason to complain.
Documentation with KairosDB is very vague and confusing about a lot of things such as how do tags work (does a match have to have all of the tags specified or only one), vague about performance and some parts I think might just be wrong.
Smells are present in the API itself with things such as rather than using a type for alignment, having strange use of mutually exclusive keys implying type.
It's not fast and it doesn't optimise things on top of Cassandra. It makes an effort to do this but it fails. The only thing it might be faster than is CSV or flat files. It's not Cassandra fast.
Its query performance in anything but the most trivial of circumstances is absolutely appalling. It makes terrible use of Cassandra and you're liable to be able to do much better.
It doesn't fully use indexes and just has one very simple crude schema that tries to do everything (it can't performance wise). It's three tables each with three columns, practically what you get if you search for blogs with tutorials on how to use Cassandra for time series. Almost everything is a blob (a couple are text). Some columns appear unused, they're just noise, junk data.
If you use tags it has to scan all the tag values. It takes all the rows from Cassandra for a metric then scans the tags. If you do a query for the temperature over time in New York, it'll load every possible tag combination for the temperature metric from the database. This isn't lightweight either, it has to deserialize all the values using a custom scheme.
It'll not only load combinations you don't need but will also load rows up to three weeks worth of tag combinations more than it needs. This isn't even getting at datapoints yet. They binary encode the values making it impossible to use things such as aggregate functions, instead KairosDB is replacing Cassandra for this and reimplementing the wheel. That means it must send every row, every data point back to KairosDB which must then aggregate those data points itself.
If you use tags, then you will probably have big problems. If you use a lot of metrics then you might have problems. If you use aggregates or group by then you will probably have big problems.
If you want something for very simple cases or where you don't mind terrible performance, instability in some scenarios, losing data (for a long time until it was fixed it was just dropping data), etc but only want ease of use then you might get by with KairosDB but if you're serious about your usage and big data, don't touch it with a barge pole.
It also has a lot of problems with junk data. I've found databases 99.99% full of data put there by the KairosDB reporter. After the hell of clearing it all out (Cassandra really isn't really suited to mass deletes), turning off the reporter as the instructions state, it's still populating the database with GB of data. I cleared it all out, when and checked Cassandra to find that just a few days later over half the data was KairosDB's reporting junk.
There's an public ticket from IBM about this (oddly they're using H2, not meant for production) you can find with a search and lots of people concerned about read performance which can get pretty bad in KairosDB.
Data bloat can get hairy where it also stores all tag values and metric names ever inserted. There's not a brilliant set of strategies available for managing garbage collection of this data. I've implemented my own solution and it doesn't even need to use that table for anything any of my use cases with the exception of being able to know the metrics that are in the system without a full table scan.
I replaced the KairosDB daemon with a client in my language that just connects to Cassandra directly, taking KairosDB out of the picture which is on average faster without optimizations. In the cases where it doesn't do as well I'm fairly certain that it's down to quirks in the Cassandra driver for the programming language I'm using. That language is much slower than Java and loses the benefit of not being a daemon, yet it's still faster on average, just porting is enough. Not being a daemon means it can't use resource pools so easily. It's not multi-threaded either but it's still faster. Profiling also seems to point to the Cassandra driver which probably needs a bit of tuning. It's very apparent KairosDB does very poorly at caching despite using a lot of memory and has a very poor ability to expose any opportunity to optimise.
When I add a very basic cache for the first phase lookup my script always performs much faster for all of my benchmarks and stress tests. Ultimately I'll need to fix the schema in Cassandra to have full performance but even before then, adding a cache for data points will speed things up a thousand times as well as substantially reduce resource usage. Simply fiddling the queries for one data case will allow it to only fetch the rows it needs in the first stage.
If I can use the KairosDB schema better than KairosDB does in a thousand lines of code then why am I using KairosDB at all? If I take it out entirely then I can also create a far better suited schema to my use cases.
postgresql (perhaps with timescaledb), MySQL, Cassandra and MongoDB in my experience can all do not too badly for a range of use cases as long as you know their ins and outs, they're relatively flexible. From what I understand (I've only used it a little) things like Graphite will be very specific to certain use cases so you'd need to evaluate it for that. KairosDB suffers a similar problem though it exposes features it just can't handle internally while giving the impression of having at least a few decent features. Others, I don't know anything. If in doubt always check the source code to ensure it's sane and search for disaster stories.