Slow query to retrieve transactions for account

sderuiter · April 8, 2022, 3:16pm

I’ve followed the guidance to query the transaction log to get recent transactions for an account, but query times are very slow for finalisers and large bakers.

I’m using 2 queries (count and get):

Count:

SELECT count(ati.account) 
FROM ati LEFT JOIN summaries ON ati.summary = summaries.id 
WHERE ati.account = %s 
AND (summaries.summary#>>'{Left,type,type}' IN ('accountTransaction')
OR summaries.summary#>>'{Right,tag}'  IN ('BakingRewards'))

Get

SELECT
encode(ati.account, 'hex'), 
encode(summaries.block, 'hex'),
summaries.timestamp,
summaries.height,
CAST(summaries.summary AS text)
FROM ati LEFT JOIN summaries ON ati.summary = summaries.id
WHERE ati.account = %s
AND (summaries.summary#>>'{Left,type,type}' IN ('accountTransaction')
OR summaries.summary#>>'{Right,tag}'  IN ('BakingRewards'))
ORDER BY summaries.timestamp DESC
LIMIT %s
OFFSET %s

Both queries filter out BlockRewards and FinalizerRewards, but still these queries can take 30+ seconds for a finaliser account (granted: such an account has 2m+ transactions in total).

Is there a way to speed up these queries? Or any other suggestions for improvement?

td202 · April 8, 2022, 4:17pm

I have not tested this myself, so I cannot say for sure, but it might help to create a GIN index on the summaries.summary column and then use operators such as @> that are supported by the index.

To create the index:

CREATE INDEX summarygin ON summaries USING gin (summary)

To rewrite the query (first one):

SELECT count(ati.account)
FROM ati LEFT JOIN summaries ON ati.summary = summaries.id
WHERE ati.account = %s
AND (summaries.summary@>'{"Left":{"type":{"type":"accountTransaction"}}}'
OR summaries.summary@>'{"Right":{"tag":"BakingRewards"}}')

Note that the index may slow down the node that is inserting the values in the database.

abizjak · April 8, 2022, 5:05pm

FWIW we have tried with this index, and various other kinds of indices on both the summary column, or a combination of columns.

It does not appear that any of that works reliably, at least not if we want to support all the queries the wallet-proxy needs. The behaviour is also highly unpredictable in the sense that adding an index, such as the one suggested by Thomas above may improve one query, while slowing down others.

The correct solution would be to redesign the tables to support the queries you want. In particular having separate tables for transactions and rewards and join them when needed.

As a temporary solution, until we replace the current transaction logging setup, you could add triggers to postgresql that would insert into these separate tables upon insertion by the node. Insertions would be a bit slower, but in practice this is inconsequential with the load that exists at the moment, and then specific queries, such as filtering out rewards would be much faster.

sderuiter · April 8, 2022, 5:10pm

Are you referring to this logger? I have this running since today. Does that use a different / better table structure?

abizjak · April 8, 2022, 5:23pm

No, it does not (yet). That is planned as a next step. For now it just replicates (almost) the logging that is done by the node.

The idea is that the logger can be much more easily adapted to specific needs, e.g., if specific queries are needed, then it can be adapted to log data in a specific format to support those queries.

sderuiter · April 11, 2022, 7:48pm

Are you requesting feedback for this new logger? As I’ve tried this on two machines, but it silently fails after 150K blocks or so, with no error logged. It also seems to slow down before it stops? Are there any logs to help debug this?

abizjak · May 5, 2022, 6:13pm

Sorry for the silence, but I just had not had time to look into this.

There was an embarrasing bug in the recent version that I introduced. It is now fixed fixed on main.

If you have feedback on the fixed version you are very welcome. But could you provide it in the repository instead as issues? It’s easier to keep track.