Why is Building Pipelines Different from Software Development?

It Doesn’t Have to Be! Simplify Your CI/CD Workflow with Dagger

Piotr

Published in

ITNEXT

9 min readJun 13, 2024

Introduction

CI/CD pipelines are essential for automating the process of software integration and deployment, ensuring that code changes are automatically tested, integrated, and deployed to production with minimal manual intervention or ideally in a fully automated way.

However, building and managing pipelines is not easy. In this blog, we will look into addressing some of the most common pain-points of pipelines development and see how to improve.

This content will be valuable for Developers, Architects, DevOps Specialists, or anyone curious about how to improve building and maintaining CI/CD pipelines.

Challenges of building and running a CI/CD pipeline

The main challenge comes from the fact that CI/CD pipelines development and lifecycle management is treated differently from software development practices.

Nowadays, pipelines are mostly written in YAML. Large configuration files instruct pipeline runners hosted on services like GitLab or GitHub how to interpret a pipeline workflow file and what actions should happen.

👉 Read more about YAML file structure using Azure DevOps as an example.

Individual actions are wrapped in YAML tasks or steps which are in turn often bash scripts executed in the runner’s environment.

Here is an example build job with multiple steps. Actions are used to call specialized steps and by default, the runner will execute commands provided in the run block.

  build:
    runs-on: ubuntu-22.04
    steps:
      - name: Checkout
        uses: actions/checkout@v4

      - name: Setup Python
        uses: actions/setup-python@v5
        with:
          python-version: ${{ env.PYTHON_VERSION }}

      - name: Setup Hatch
        run: pipx install hatch==1.7.0

      - name: Set Default PyPI Project Version
        if: env.PYPI_VERSION == ''
        run: echo "PYPI_VERSION=v0.0.0+$(date -d@$(git show -s --format=%ct) +%Y%m%d%H%M%S)-$(git rev-parse --short=12 HEAD)" >> $GITHUB_ENV

      - name: Set PyPI Project Version
        run: hatch version ${{ env.PYPI_VERSION }}

      - name: Build Sdist and Wheel
        run: hatch build

      - name: Upload Sdist and Wheel to GitHub
        uses: actions/upload-artifact@v4
        with:
          name: dist
          path: "dist/*"
          if-no-files-found: error
          retention-days: 1

The main issue is that, while the commands can be executed locally, there is no guarantee that the same commands will execute in the same way in the runner’s environment.

In other words, we cannot guarantee full reproducibility of the pipeline.

Since reproducibility does not work, testing happens in the runner’s environment. The only way to debug the workflow is to add print statements or log to a file and dig in the runner’s log file to see what went wrong (spoiler alert, most of the time it’s missing comma or something equally trivial).

This typically results in commit history like this; where pipeline doesn’t work, but the only way to be sure it will work is to let it run and check for errors.

What if pipelines could be… just code

Good news is that they can! Pipelines can be just code working in a standardized way, who knows maybe even using docker under the hood, but more on that later.

Why would we want pipelines to be written in a programming language?

creating and running tests with actual testing frameworks
locally debugging and testing each pipeline step
pipeline code versioning and releases
linting and code support (including various copilots) inside and IDE or text editor
using programming languages facilities; async calls, functions, data structures, and more

What’s the deal with docker

I mention earlier that having docker could be nice. This is because when running pipelines we are bound to the proprietary runners offered to us by 3rd party vendors such as GitHub, GitLab, but also tools like Jenkins mandate their own syntax and integration points.

Wouldn’t it be cool if we could run the whole pipeline anywhere with reproducibility guarantees? For this to happen we need a standard way of executing pipeline jobs and steps/tasks. Here is where docker comes in.

Instead of running all pipeline steps in a proprietary runner format, we just need to run one step to hook into the runner’s execution environment and let the containerized environment do the rest.

What is dagger and how it can help

Sadly, dagger has nothing to do with beautiful blades but more do to with DAG (Directed acyclic graph).

So less of a:

And more like:

Source: dagger website https://dagger.io/

Transform your Messy CI Scripts into Clean Code
Powerful, programmable open source CI/CD engine that runs your pipelines in containers — pre-push on your local machine and/or post-push in CI

Let’s convert a pipeline

We are going to convert an actual python pipeline of one of my projects killercoda-cli which is a simple Python CLI helping with writing killercoda.com scenarios. The goal is to convert just the right amount of steps and introduce dagger gradually to the project.

The pipeline builds the CLI, runs tests and enables manual push to PyPi registry.

https://github.com/Piotr1215/killercoda-cli/blob/main/.github/workflows/ci.yml

First things first; prerequisites

Before we start, we need to install dagger CLI and docker (podman and nerdctl would work too).

I’m using Linux, so curl will do just fine (with modified path to save the binary to):

curl -L https://dl.dagger.io/dagger/install.sh | BIN_DIR=/usr/local/bin sh

The installation script instructs me how to add completion; executing

dagger completion zsh > /usr/local/share/zsh/site-functions/_dagger

and after reloading zshrc tab completion works just fine:

Adding dagger to the project

Running dagger init --sdk=python pulled dagger image, created a dagger directory and dagger.json file.

dagger
├── pyproject.toml
├── requirements.lock
├── sdk
│   ├── codegen
│   │   ├── pyproject.toml
│   │   ├── requirements.lock
│   │   └── src
│   │       └── codegen
│   │           ├── cli.py
│   │           ├── generator.py
│   │           ├── __init__.py
│   │           └── __main__.py
│   ├── LICENSE
│   ├── pyproject.toml
│   ├── README.md
│   └── src
│       └── dagger
│           ├── client
│           │   ├── base.py
│           │   ├── _core.py
│           │   ├── gen.py
│           │   ├── _guards.py
│           │   ├── __init__.py
│           │   └── _session.py
│           ├── _config.py
│           ├── _connection.py
│           ├── _engine
│           │   ├── conn.py
│           │   ├── download.py
│           │   ├── __init__.py
│           │   ├── progress.py
│           │   ├── session.py
│           │   └── _version.py
│           ├── _exceptions.py
│           ├── __init__.py
│           ├── log.py
│           ├── _managers.py
│           ├── mod
│           │   ├── _arguments.py
│           │   ├── cli.py
│           │   ├── _converter.py
│           │   ├── _exceptions.py
│           │   ├── __init__.py
│           │   ├── _module.py
│           │   ├── _resolver.py
│           │   ├── _types.py
│           │   └── _utils.py
│           ├── py.typed
│           └── telemetry
│               ├── attributes.py
│               └── __init__.py
└── src
    └── main
        └── __init__.py

12 directories, 42 files

{
  "name": "killercoda-cli",
  "sdk": "python",
  "source": "dagger",
  "engineVersion": "v0.11.7"
}

Build Environment Container

Let’s add a function to the src->main->__init__.py to create a development environment to build and test our project.

This function builds an environment with all the dependencies required by my application.

We can run it dagger call build-env — source=. and build an image.

💡Notice the kebab-case naming convention in the CLI, build_env becomes build-env

Earlier, we discussed local debugging and testing. Well, this is a container, so we should be able to drop into it with a shell!

➜ dagger call build-env --source=. terminal --cmd=bash
root@grt1fshu1uc6c:/src# ls -lah
total 2.1M
drwxr-xr-x 14 root root 4.0K Jun 13 16:21 .
drwxr-xr-x  1 root root 4.0K Jun 13 16:23 ..
-rw-rw-r--  2 root root 1.3M May 26 20:39 .aider.chat.history.md
-rw-rw-r--  2 root root  23K May 26 20:39 .aider.input.history
drwxr-xr-x  2 root root 4.0K May 26 20:39 .aider.tags.cache.v3
-rw-r--r--  2 root root  52K Jun 13 14:00 .coverage
-rw-rw-r--  2 root root  116 Feb 10 19:36 .coveragerc
drwxrwxr-x  9 root root 4.0K Jun 13 16:21 .git
drwxrwxr-x  3 root root 4.0K Feb 10 11:44 .github
-rw-rw-r--  2 root root 3.4K Feb 10 11:34 .gitignore
drwxr-xr-x  6 root root 4.0K Jun 13 12:55 .pytest_cache
drwxrwxr-x  4 root root 4.0K Jun 13 13:58 .venv-test
-rw-rw-r--  2 root root 1.1K Feb 10 11:39 LICENSE.txt
-rw-rw-r--  2 root root 5.4K Jun 13 16:17 README.md
drwx------  2 root root 4.0K Jun 13 14:51 _media
drwxrwxr-x  2 root root 4.0K May 31 11:38 assets
-rw-rw-r--  2 root root    0 Jun 13 14:44 costam.log
-rw-rw-r--  2 root root 6.4K May 26 14:42 coverage.xml
drwxr-xr-x  4 root root 4.0K Jun 13 14:45 dagger
-rw-r--r--  2 root root  102 Jun 13 12:40 dagger.json
drwxrwxr-x  2 root root 4.0K Feb 10 20:21 dist
drwxrwxr-x  3 root root 4.0K May 31 11:32 killercoda_cli
-rw-rw-r--  2 root root 686K Jun 13 14:47 output.txt
-rw-rw-r--  2 root root 2.5K Jun 13 14:15 pyproject.toml
drwxrwxr-x  2 root root 4.0K May 31 11:08 temp_template
drwxrwxr-x  4 root root 4.0K Jun 13 14:03 tests
root@grt1fshu1uc6c:/src#

Being able to locally check the container is a game changer. Remember, the same will run on a remote runner VM.

Running Tests

One more function for running tests, notice how we execute the build_environment function before running tests.

➜ dagger call test --source=.
Current directory: /tmp/test_generate_assets/killercoda-assets
Source directory: /tmp/test_generate_assets/killercoda-assets
Generating assets from template: https://github.com/Piotr1215/cookiecutter-killercoda-assets
Output directory: /tmp/test_generate_assets
Assets generated successfully.

Tests pass however outpout is a bit sparse, earlier we have setup tab completion, let’s see if there are any flags that can help us dagger call test — source=. <TAB>

➜ dagger call test --source=. --debug
--debug     -- show debug logs and full verbosity
--json      -- Present result as JSON
--mod       -- Path to dagger.json config file for the module or a directory containing that file. Either local path (e.g. "/path/to/some/dir") or a github repo (e.g. "github.com/dagger/dagger/path/to/some/subdir")
--output    -- Path in the host to save the result to
--progress  -- progress output format (auto, plain, tty)
--silent    -- disable terminal UI and progress output
--verbose   -- increase verbosity (use -vv or -vvv for more)

The debug option gives us full run logs with max verbosity, great!

Build & Publish

The last two functions are publish and build. Publish is going to use the awesome ttl.sh service, which allows for publishing short-lived images (max 24h) and is great for testing. Build will perform a multistaged build and get our application ready for deployment.

➜ dagger call publish --source=.
ttl.sh/killercoda-cli-15186746@sha256:6d2e22543154fff996e3d829450953981c60aba6f161477e5aa3e00d3faaa2cb

Integrate with GitHub Actions

The integration with GitHub Actions is easy, just add the action to YAML workflow and call any dagger function

- name: Hello
  uses: dagger/dagger-for-github@v5
  with:
    verb: call
    args: call publish --source=.

Closing Thoughts

Integrating dagger into my GitHub Actions workflow wasn’t super easy, but it was way easier than dealing with pure YAML. The ability to adapt only parts of the workflow is great, no need for all-or-nothing rewrites; small, incremental steps are just fine.

From a high level, the below diagram shows working with dagger locally and in a remote environment.

A bit win, in my opinion, is that if tomorrow I will decide to move to GitLab, I can do it much easier. Instead of migrating the whole pipeline, I keep it as is and migrate only the entry-point.

With the right amount of abstractions and clever reuse of the docker engine, dagger is a very strong choice for any pipeline. The community can collaborate on functions and capture best practices, which are available on daggerverse.

Give dagger a try and let me know what are your experiences.

Thanks for taking the time to read this post. I hope you found it interesting and informative.

🔗 Connect with me on LinkedIn

🌐 Visit my Website

📺 Subscribe to my YouTube Channel