Lowering memory footprint in influxdb

peci1 · March 6, 2021, 10:26pm

Hi everyone. I’m running InfluxDB 1.8 in a Debian LXC container. I store measurements from BigClown devices in this db (temperatures, weather conditions etc.). As these sensors publish data on each change (e.g. temperature gets published on each 0.1 K change), it sums up to a large amount of data over a few years. Unfortunately, I didn’t invest into the 2 GB Omnia, so I’m stuck with 1 GB or RAM. And influxdb alone takes over 100 MB when this database is loaded. And sometimes it’s doing some internal reorganization of indexes, and then it needs even more. And that sometimes triggers OOM (which usually kills the influxdb, but not always).

I was thinking about ways to lower the memory footprint. So I spent some time searching the internet for something that could implement a kind of forgetting of historical data. I haven’t found anything that would be nicely integrated in InfluxDB (except running a cron script that would read the whole db, apply the retention policy, save the result in a new db and then switch the databases). My idea is that I don’t really need to know about every 0.1 K temperature change that happened a year ago. I’m more interested in some hour- or day-based aggregate (e.g. a mean of all measurements). Did anybody try to implement something like that?

I found the docs page https://docs.influxdata.com/influxdb/v1.8/query_language/manage-database/#create-retention-policies-with-create-retention-policy , but that only seems to be helpful if I wanted to delete all data older than some limit.

I also found that using a TSI index could actually help with memory usage (https://www.influxdata.com/blog/how-to-overcome-memory-usage-challenges-with-the-time-series-index/), but the article says it is more helpful when you work with a lot of time series (which I do not, I have about 20 of them). Has anyone tried that?

Thanks for the interesting InstallFest talk, @mprudek. Maybe you would have some related comments?

mprudek · March 8, 2021, 8:11am

Hi,

I’m not very familiar with InfluxDB 1.8 as the first version we really started to use was 2.0. Nevertheless I think that the mechanisms are still quite similar, maybe just a bit easier to use in version 2.0.

So the first thing - you already mentioned it - would be to set retention policy to that too much precise data. If you don’t want to store them, then there’s no other option than to forget them after maybe 1w/1mo?

Meanwhile you have to store separately a different copy of the same data - but probably aggregated in some way. You can e.g. store a mean value for every hour/day etc. This is done using periodic tasks written in FLUX language in version 2.0. In the previous version you have to use Kapacitor (or some cron) for the same goal.

As a result, you would run task e.g. every hour that calculates the mean value for the last hour and stores it to a different DB/bucket.

Using TSI with cca 20 series really wouldn’t be a big help (it’s used by default in 2.0 btw)

My general advice would be to migrate to 2.0 because it is both inevitable and IMHO easier to use.

maneth · March 12, 2021, 7:42pm

Hi,

I used to have some performance problems with InfluxDB on Raspberry and since that is also armv7 device it could be related.

I solved my issues by restricting memory usage and concurrency of InfluxDB and also disabled internal monitoring feature of InfluxDB which was found to be problematic in a bug report. With those modifications I haven’t had any performance issues with InfluxDB on raspberry pi. Before those issues I had tried with shorter retention periods but after those config changes I removed retention periods again as there was no reason to have them anymore.

I documented my configuration changes here: https://github.com/nikobockerman/ruuvitags-raspberrypi/tree/master/influxdb#initial-creation
And the suggestion to disable internal monitoring was given here: https://github.com/influxdata/influxdb/issues/9475#issuecomment-375773775

I don’t know if any of those configuration changes will actually help you, but it might be worth trying.

peci1 · March 12, 2021, 8:47pm

Thank you @mprudek, I’ll definitely have a look at version 2.0, but that will have to wait. Storing the downsampled data in a separate table was one thing I was thinking of, but then I’d lose the nice property of all the data being in one database and accessible through the same grafana display.

@maneth You made my day! After disabling monitoring and drop database _internal, influxdb went from 150 MB to 30 MB of RAM! And CPU load also nicely decreased. I have free RAM, hooray!

system · March 15, 2021, 8:47pm

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.