Reliable Garden (DevOps tool) on CI

I worked on a project migrating an existing platform to the cloud. The platform consisted of 20+ services and infrastructure components including databases, message queues, blob storage, and observability tools. It featured an e2e-test suite for functional testing, executed on GitLab CI by spinning up the entire platform using docker-compose and running JUnit tests.

During migration, we replaced the centralized docker-compose setup with a distributed configuration using Garden v0.13.

Challenge

Our e2e testing setup involved a Garden project that:

  1. Pulled projects from different Git repositories
  2. Started required services in a dedicated Kubernetes(K8S) namespace
  3. Built docker image with tests suite and executed it inside the same K8S namespace
  4. Collected results
  5. Either shut down the application (on success) or kept it running for investigation (on failure)

While this seemed ideal for Garden, the CI pipeline’s deployment stage faced several challenges:

  • Overloading GitLab with repository pulls
  • Overloading the K8S API
  • Excessive memory consumption (up to 500MB per Helm process)
  • Extended pipeline duration due to repeated Helm dependency updates

Solution

It was obvious that Garden lacks functionality to throttle number of parallel requests as well as retries of operations. Investigation revealed that Garden downloads tools to it’s .garden directory and executes them while parsing their output. I could add missing functionality by creating proxy shell scripts for these tools. I decided to go with retries for Git operations and throttling number of Helm processes. In result I would have the following process:

graph TD
    A[CI Job Start] --> C[Replace Tools with Proxies]
    C --> D[Garden Deploy]
    D --> E[Git proxy]
    D --> F[Helm proxy]
    E -->|Retry Logic| E
    F -->|Process Limiting| F
    E --> G[Git]
    F --> H[Helm]

Implementation

1. Git Proxy (fake-git.sh)

Any debug output is commented, because Garden tries to parse it and fails.

#!/bin/bash

function retry {
  local retries=$1
  shift

  local count=0
  until "$@"; do
    exit=$?
    wait=$((2 ** $count))
    count=$(($count + 1))
    if [ $count -lt $retries ]; then
      #echo "Retry $count/$retries exited $exit, retrying in $wait seconds..."
      sleep $wait
    else
      #echo "Retry $count/$retries exited $exit, no more retries left."
      return $exit
    fi
  done
  return 0
}

retry 5 /usr/bin/git $@

The fake-git.sh script implements exponential backoff retry logic, attempting operations up to 5 times before failing.

2. Helm Proxy (fake-helm.sh)

#!/bin/bash

#echo "In FAKE_HELM! Args: $@"

# Limit concurrent Helm processes to 8 (based on observed memory usage)
function canRun() {
  # Here helm~ is the original helm downloaded by Garden
  COUNT="$(ps aux | grep helm~ | \
           grep -v grep | awk '{print $2}' | \
           wc -l | awk '{$1=$1;print}')"
  if [[ ${COUNT} < 8 ]]; then
      #echo "$(basename $0) is already running."
      return 0
  fi
  #echo "running $(basename $0)"
  return 1
}

# Initial random delay (0-9 seconds) to prevent concurrent checks
sleep $((RANDOM % 10))

until canRun
do
 sleep 1
done

helmArgs="$@"

# skip unnecessary package refreshes
if [[ $helmArgs == *"dependency update"* ]]; then
    #echo "FAKE_HELM: adding --skip-refresh"
    helmArgs="$helmArgs --skip-refresh"
fi

exec $(dirname "$0")/helm~ $helmArgs

The fake-helm.sh script:

  • Limits concurrent Helm processes to 8 (chosen based on our memory constraints: 8 * 500MB = 4GB max usage)
  • Skips redundant package refreshes

Limiting number of concurrent commands is achieved with canRun function which checks number of currently running helm processes and returns 0 if we can run more processes or 1 if limit is already reached. To make it work we also needed to introduce a random delay of command execution, otherwise all processes would simple pass the check. Of course, this logic does not give 100% guarantee of running less than defined limit, but is good enough to throttle parallel execution and make the pipeline stable.

Finally, to omit unecessary refreshes of packages the fake-helm.sh adds --skip-refresh flag to helm dependency update command, because it could be done only once at the beggining of the pipeline.

3. Pipeline Configuration

And here is how our deployment job looked like in GitLab CI:

deploy-application:
  stage: deploy
  image:
    name: gardendev/garden:0.13-buster
    pull_policy: if-not-present
  resource_limits:
    memory: 5Gi
    cpu: "2"
  timeout: 1h
  script:
    # Add extra Helm repositories
    - garden tools kubernetes.helm -- repo add bitnami https://charts.bitnami.com/bitnami
    # Single repositories update
    - garden tools kubernetes.helm -- repo update
    - ./prepare-fake-commands.sh || exit 1
    - garden deploy -l 4
  artifacts:
    paths:
      - .garden/error.log

The only missing part is prepare-fake-commands.sh script which replaces original tools with our proxy-scripts:

#!/bin/bash

set -e  # Exit on any error

helmLocation=$(garden tools kubernetes.helm --get-path)
echo "Garden Helm location: $helmLocation"

mv $helmLocation "$helmLocation~"
echo "Copied original Garden Helm to: ${helmLocation}~"

cp ./fake-helm.sh $helmLocation
echo "Copied fake-helm.sh to: ${helmLocation}"

cp ./fake-git.sh /garden/git
echo "Copied fake-git to: /garden/git"

ln -s $(garden tools kubernetes.kubectl --get-path) /garden/kubectl
echo "Created link of garden kubectl to: /garden/kubectl"

Conclusion

This is a story with happy end. My improvements significantly enhanced pipeline reliability while managing resource consumption effectively. However, it highlights an important lesson: modern tools, despite their promise, may not be fully mature. When applying them to your specific use case, be prepared to:

  • Understand the tool’s internal workings
  • Create workarounds that respect the tool’s design
  • Monitor and measure the impact of your solutions

Remember: sometimes the simplest solution - like a shell script wrapper - can bridge the gap between a tool’s current capabilities and your production needs.