Node timing out

Hello,

Our node times out daily.
No error or warning is logged by the node.
When the timeout starts, it continues until the node is restarted.
We are using recommended HW requirements.And actually, server is responsive when getting the timeouts.

Any idea what can we check or if this is a known issue?

Hello ed-umb,

Is it possible that you share your log files, so we can investigate further what could be the issue?

Best regards, Zoltán

Hi,

When you say that it times out, what exactly do you mean? That GRPC requests to the node are timing out?

You could perhaps set your node log level to DEBUG, which will produce more log information that may be useful for debugging the issue.

Hi, I have exported the logs with the log leve set to debug. How can I share it with you?
Thanks,

We are also getting this error

As per the documentation the default CONCORDIUM_NODE_GRPC2_MAX_CONCURRENT_REQUESTS is 100, and seems weird that we reach this threshold.

You could perhaps share the relevant parts of the logs on pastebin.

Do you have any other applications that are connecting to the node’s GRPC?

Hi,
the CONCORDIUM_NODE_GRPC2_MAX_CONCURRENT_REQUESTS 100 is a good number if you run your node for staking purposes or sending some transactions. I would personally increase the number if you are planning to use the node as a backend for GRPC queries.
In that case you could try to increase the following variables:

      CONCORDIUM_NODE_GRPC2_MAX_CONNECTIONS=800
      CONCORDIUM_NODE_GRPC2_MAX_CONCURRENT_REQUESTS=300
      CONCORDIUM_NODE_GRPC2_MAX_CONCURRENT_STREAMS=300
      CONCORDIUM_NODE_GRPC2_MAX_CONCURRENT_REQUESTS_PER_CONNECTION=30
      CONCORDIUM_NODE_GRPC2_KEEPALIVE_INTERVAL=15
      CONCORDIUM_NODE_GRPC2_KEEPALIVE_TIMEOUT=5

Keep in mind that these values are tied to the limits of the file descriptors set up on your OS. If you set those too high, you might run out of file descriptors(default for ubuntu is 1024) and your node might stop working.

Here is additional information about what those variables represent on the documentation.
Let me know if you have further questions.

Dears,
I have uploaded the logs here: 20240512_concordium-node.log - Google Drive

Can you point to a particular time covered by the logs where you experience the timeout issue? I see various calls to the GRPC, but none that look like they’re timing out. The last one seems to be:

{"log":"2024-05-10T23:39:27.526744731Z: DEBUG: request; method=POST uri=http://concordium.umb.network:20000/concordium.v2.Queries/InvokeInstance version=HTTP/2.0\n","stream":"stderr","time":"2024-05-10T23:39:27.526832455Z"}
{"log":"2024-05-10T23:39:27.526760222Z: DEBUG: started processing request\n","stream":"stderr","time":"2024-05-10T23:39:27.526845216Z"}
{"log":"2024-05-10T23:39:27.527590032Z: DEBUG: finished processing request latency=0 ms\n","stream":"stderr","time":"2024-05-10T23:39:27.527647294Z"}
{"log":"2024-05-10T23:39:27.527610393Z: DEBUG: end of stream stream_duration=0 ms status=0\n","stream":"stderr","time":"2024-05-10T23:39:27.527656515Z"}