Node timing out

ed-umb · May 9, 2024, 2:47pm

Hello,

Our node times out daily.
No error or warning is logged by the node.
When the timeout starts, it continues until the node is restarted.
We are using recommended HW requirements.And actually, server is responsive when getting the timeouts.

Any idea what can we check or if this is a known issue?

Zoltan · May 10, 2024, 6:40am

Hello ed-umb,

Is it possible that you share your log files, so we can investigate further what could be the issue?

Best regards, Zoltán

td202 · May 10, 2024, 9:11am

Hi,

When you say that it times out, what exactly do you mean? That GRPC requests to the node are timing out?

You could perhaps set your node log level to DEBUG, which will produce more log information that may be useful for debugging the issue.

s.giovacchini · May 13, 2024, 12:43pm

Hi, I have exported the logs with the log leve set to debug. How can I share it with you?
Thanks,

s.giovacchini · May 13, 2024, 3:01pm

We are also getting this error

As per the documentation the default CONCORDIUM_NODE_GRPC2_MAX_CONCURRENT_REQUESTS is 100, and seems weird that we reach this threshold.

td202 · May 15, 2024, 9:12am

You could perhaps share the relevant parts of the logs on pastebin.

Do you have any other applications that are connecting to the node’s GRPC?

fb1010 · May 15, 2024, 1:54pm

Hi,
the CONCORDIUM_NODE_GRPC2_MAX_CONCURRENT_REQUESTS 100 is a good number if you run your node for staking purposes or sending some transactions. I would personally increase the number if you are planning to use the node as a backend for GRPC queries.
In that case you could try to increase the following variables:

      CONCORDIUM_NODE_GRPC2_MAX_CONNECTIONS=800
      CONCORDIUM_NODE_GRPC2_MAX_CONCURRENT_REQUESTS=300
      CONCORDIUM_NODE_GRPC2_MAX_CONCURRENT_STREAMS=300
      CONCORDIUM_NODE_GRPC2_MAX_CONCURRENT_REQUESTS_PER_CONNECTION=30
      CONCORDIUM_NODE_GRPC2_KEEPALIVE_INTERVAL=15
      CONCORDIUM_NODE_GRPC2_KEEPALIVE_TIMEOUT=5

Keep in mind that these values are tied to the limits of the file descriptors set up on your OS. If you set those too high, you might run out of file descriptors(default for ubuntu is 1024) and your node might stop working.

Here is additional information about what those variables represent on the documentation.
Let me know if you have further questions.

s.giovacchini · May 16, 2024, 2:40pm

Dears,
I have uploaded the logs here: 20240512_concordium-node.log - Google Drive

td202 · May 24, 2024, 12:31pm

Can you point to a particular time covered by the logs where you experience the timeout issue? I see various calls to the GRPC, but none that look like they’re timing out. The last one seems to be:

{"log":"2024-05-10T23:39:27.526744731Z: DEBUG: request; method=POST uri=http://concordium.umb.network:20000/concordium.v2.Queries/InvokeInstance version=HTTP/2.0\n","stream":"stderr","time":"2024-05-10T23:39:27.526832455Z"}
{"log":"2024-05-10T23:39:27.526760222Z: DEBUG: started processing request\n","stream":"stderr","time":"2024-05-10T23:39:27.526845216Z"}
{"log":"2024-05-10T23:39:27.527590032Z: DEBUG: finished processing request latency=0 ms\n","stream":"stderr","time":"2024-05-10T23:39:27.527647294Z"}
{"log":"2024-05-10T23:39:27.527610393Z: DEBUG: end of stream stream_duration=0 ms status=0\n","stream":"stderr","time":"2024-05-10T23:39:27.527656515Z"}