Reliable Garden (DevOps tool) on CI
I worked on a project migrating an existing platform to the cloud. The platform consisted of 20+ services and infrastructure components including databases, message queues, blob storage, and observability tools. It featured an e2e-test suite for functional testing, executed on GitLab CI by spinning up the entire platform using docker-compose and running JUnit tests.
During migration, we replaced the centralized docker-compose setup with a distributed configuration using Garden v0.13.
Challenge
Our e2e testing setup involved a Garden project that:
- Pulled projects from different Git repositories
- Started required services in a dedicated Kubernetes(K8S) namespace
- Built docker image with tests suite and executed it inside the same K8S namespace
- Collected results
- Either shut down the application (on success) or kept it running for investigation (on failure)
While this seemed ideal for Garden, the CI pipeline’s deployment stage faced several challenges:
- Overloading GitLab with repository pulls
- Overloading the K8S API
- Excessive memory consumption (up to 500MB per Helm process)
- Extended pipeline duration due to repeated Helm dependency updates
Solution
It was obvious that Garden lacks functionality to throttle number of parallel requests as well as retries of operations. Investigation revealed that Garden downloads tools to it’s .garden
directory and executes them while parsing their output. I could add missing functionality by creating proxy shell scripts for these tools. I decided to go with retries for Git operations and throttling number of Helm processes. In result I would have the following process:
graph TD A[CI Job Start] --> C[Replace Tools with Proxies] C --> D[Garden Deploy] D --> E[Git proxy] D --> F[Helm proxy] E -->|Retry Logic| E F -->|Process Limiting| F E --> G[Git] F --> H[Helm]
Implementation
1. Git Proxy (fake-git.sh
)
Any debug output is commented, because Garden tries to parse it and fails.
#!/bin/bash
function retry {
local retries=$1
shift
local count=0
until "$@"; do
exit=$?
wait=$((2 ** $count))
count=$(($count + 1))
if [ $count -lt $retries ]; then
#echo "Retry $count/$retries exited $exit, retrying in $wait seconds..."
sleep $wait
else
#echo "Retry $count/$retries exited $exit, no more retries left."
return $exit
fi
done
return 0
}
retry 5 /usr/bin/git $@
The fake-git.sh
script implements exponential backoff retry logic, attempting operations up to 5 times before failing.
2. Helm Proxy (fake-helm.sh
)
#!/bin/bash
#echo "In FAKE_HELM! Args: $@"
# Limit concurrent Helm processes to 8 (based on observed memory usage)
function canRun() {
# Here helm~ is the original helm downloaded by Garden
COUNT="$(ps aux | grep helm~ | \
grep -v grep | awk '{print $2}' | \
wc -l | awk '{$1=$1;print}')"
if [[ ${COUNT} < 8 ]]; then
#echo "$(basename $0) is already running."
return 0
fi
#echo "running $(basename $0)"
return 1
}
# Initial random delay (0-9 seconds) to prevent concurrent checks
sleep $((RANDOM % 10))
until canRun
do
sleep 1
done
helmArgs="$@"
# skip unnecessary package refreshes
if [[ $helmArgs == *"dependency update"* ]]; then
#echo "FAKE_HELM: adding --skip-refresh"
helmArgs="$helmArgs --skip-refresh"
fi
exec $(dirname "$0")/helm~ $helmArgs
The fake-helm.sh
script:
- Limits concurrent Helm processes to 8 (chosen based on our memory constraints: 8 * 500MB = 4GB max usage)
- Skips redundant package refreshes
Limiting number of concurrent commands is achieved with canRun
function which checks number of currently running helm
processes and returns 0
if we can run more processes or 1
if limit is already reached. To make it work we also needed to introduce a random delay of command execution, otherwise all processes would simple pass the check. Of course, this logic does not give 100% guarantee of running less than defined limit, but is good enough to throttle parallel execution and make the pipeline stable.
Finally, to omit unecessary refreshes of packages the fake-helm.sh
adds --skip-refresh
flag to helm dependency update
command, because it could be done only once at the beggining of the pipeline.
3. Pipeline Configuration
And here is how our deployment job looked like in GitLab CI:
deploy-application:
stage: deploy
image:
name: gardendev/garden:0.13-buster
pull_policy: if-not-present
resource_limits:
memory: 5Gi
cpu: "2"
timeout: 1h
script:
# Add extra Helm repositories
- garden tools kubernetes.helm -- repo add bitnami https://charts.bitnami.com/bitnami
# Single repositories update
- garden tools kubernetes.helm -- repo update
- ./prepare-fake-commands.sh || exit 1
- garden deploy -l 4
artifacts:
paths:
- .garden/error.log
The only missing part is prepare-fake-commands.sh
script which replaces original tools with our proxy-scripts:
#!/bin/bash
set -e # Exit on any error
helmLocation=$(garden tools kubernetes.helm --get-path)
echo "Garden Helm location: $helmLocation"
mv $helmLocation "$helmLocation~"
echo "Copied original Garden Helm to: ${helmLocation}~"
cp ./fake-helm.sh $helmLocation
echo "Copied fake-helm.sh to: ${helmLocation}"
cp ./fake-git.sh /garden/git
echo "Copied fake-git to: /garden/git"
ln -s $(garden tools kubernetes.kubectl --get-path) /garden/kubectl
echo "Created link of garden kubectl to: /garden/kubectl"
Conclusion
This is a story with happy end. My improvements significantly enhanced pipeline reliability while managing resource consumption effectively. However, it highlights an important lesson: modern tools, despite their promise, may not be fully mature. When applying them to your specific use case, be prepared to:
- Understand the tool’s internal workings
- Create workarounds that respect the tool’s design
- Monitor and measure the impact of your solutions
Remember: sometimes the simplest solution - like a shell script wrapper - can bridge the gap between a tool’s current capabilities and your production needs.