agent: Continue to retry indefinitely (#3599)

When the woodpecker server is not reachable (eg. for update,
maintenance, agent connection issue, ...) for a long period of time, the
agent tries continuously to reconnect, without any delay. This creates
**several GB** of logs in a short period of time.

Here is a sample line, repeated indefinitely:

```
{"level":"error","error":"rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial tcp x.x.x.x:xxx: connect: connection refused\"","time":"2024-04-07T17:29:59Z","message":"grpc error: done(): code: Unavailable"}
```

It appears that the [backoff
package](https://pkg.go.dev/github.com/cenkalti/backoff/v4#BackOff),
after a certain amount of time, returns `backoff.Stop` (-1) instead of a
valid delay to wait. It means that no more retry should be made, [as
shown in the
example](https://pkg.go.dev/github.com/cenkalti/backoff/v4#BackOff). But
the code doesn't handle that case and takes -1 as the next delay.
This led to continuous retry with no delay between them and creates a
huge amount of logs.

[`MaxElapsedTime` default is 15
minutes](https://pkg.go.dev/github.com/cenkalti/backoff/v4#pkg-constants),
passed this time, `NextBackOff` returns `backoff.Stop` (-1) instead of
`MaxInterval`.
This commit sets `MaxElapsedTime` to 0, [to avoid `Stop`
return](https://pkg.go.dev/github.com/cenkalti/backoff/v4#ExponentialBackOff).
This commit is contained in:
nemunaire 2024-04-09 02:24:19 +02:00 committed by GitHub
parent d0e63375fa
commit 8e45ddd58b
No known key found for this signature in database
GPG key ID: B5690EEEBB952194

View file

@ -53,6 +53,7 @@ func (c *client) Close() error {
func (c *client) newBackOff() backoff.BackOff { func (c *client) newBackOff() backoff.BackOff {
b := backoff.NewExponentialBackOff() b := backoff.NewExponentialBackOff()
b.MaxElapsedTime = 0
b.MaxInterval = 10 * time.Second //nolint: gomnd b.MaxInterval = 10 * time.Second //nolint: gomnd
b.InitialInterval = 10 * time.Millisecond //nolint: gomnd b.InitialInterval = 10 * time.Millisecond //nolint: gomnd
return b return b