-
-
Notifications
You must be signed in to change notification settings - Fork 344
Statistics are abysmally slow with large databases #251
Comments
Alternatively, I might look into building a seperate statistics builder application. |
[RELEASES: 31,721,867]-[FILES: 522,209,821] Re-generated every 1 hour on nvme disk - took only 10-15 minutes |
So from what I can see, the query for the statistics (for sqlite3) is built in magnetico/pkg/persistence/sqlite3.go Lines 447 to 459 in 9c2183b
Note the I built myself a query as it would be run for today, using one of the arbitrary date formats: SELECT strftime('%Y-%m-%d', discovered_on, 'unixepoch') AS dT
, sum(files.size) AS tS
, count(DISTINCT torrents.id) AS nD
, count(DISTINCT files.id) AS nF
FROM torrents, files
WHERE torrents.id = files.torrent_id
AND discovered_on >= 1626180197 -- this is one week ago
AND discovered_on <= 1626784997 -- this is today
GROUP BY dt; And indeed it is quite slow on a 34GB database.
It seems to read the entire CREATE INDEX torrent_file_index ON files (torrent_id); But that didn't change the query plan at all. I then also replaced the magnetico/pkg/persistence/sqlite3.go Lines 159 to 166 in 9c2183b
We already have |
At least when using an SQLite database, statistics tend to be really slow when said database reaches a large size.
This is for 12 hours:
Generating it for 12 months is probably going to take longer than I'd like to keep my computer on.
7 Minutes for 12 hours of statistics, including the high I/O usage that comes with this, seems awfully long. Maybe we should start periodically generating statistics into another table or even a new database each hour, as well as find a way to generate hourly statistics from all the old data as well.
I'm aware that users usually don't have as large of a database as I do (41 GiB currently), and that magnetico is currently explicitly not designed to scale, but I would still leave this up to consideration.
The text was updated successfully, but these errors were encountered: