Home / Knowledge / Why Do My Apps 'Fail Readiness Check" and get Killed?

Why Do My Apps 'Fail Readiness Check" and get Killed?

Apr 15, 2023

The Gigalixir readiness check fails when port 4000 isn’t open.

Many times we find that an http endpoint listens before everything is completely ready. We see this sometimes where an app opens the port but is still starting up supervisors, workers, etc.

As soon as we see the port open, we send you traffic. If you aren’t quite ready for it, this might cause your port 4000 to be backed up. Then, if we fail readiness checks for a short period of time we restart, assuming it’s unhealthy.

We have a few parameters we can tweak on our side to fit your needs. We can delay checking port 4000 or lengthen the time of failure before we restart, etc.

The readiness probe indicates whether the application running in the container is ready to accept requests. Here are our default settings for the readiness probe:

Parameter	Description	Default Value
initialDelaySeconds	Number of seconds between container start and probe start to allow for services to initialize	0
periodSeconds	Frequency of readiness test	3
timeoutSeconds	Timeout for probe responses	1
successThreshold	The number of consecutive success results needed to switch probe status to “Success”	1
failureThreshold	The number of consecutive failed results needed to switch probe status to “Failure”	3

We employ a initialDelaySeconds of 30 seconds on some of our applications that have lots of supervisors.

The drawback is that deployments take longer, as we wont even look at port 4000 until 30 seconds into the startup.

This means if you have 3 replicas, the minimum deploy time is 90 seconds. However, in the grand scheme of things, that’s really not that bad.

Setting the failureThreshold to a higher value might be helpful, but can mean you leave a “bad” replica running longer.

If you want Gigalixir Support to change any of these settings for you, write to us about your goals at any time.

⟵ PREV NEXT ⟶