Back to all posts

How to Self-Host Github Actions Runners - Part 2

By Matt MathesonOctober 4, 2023
CI Analytics

In the previous post in this series, I reviewed a few reasons you would want to host your own Github Actions Runners. I covered price, customization, and caching, which eventually motivated us at Trunk.io to host our own Runners. 

In this guide, I’ll describe the infrastructure we use to host the GitHub Actions Runners that power all of our CI and provide the code and commands necessary to do it yourself.

Actions Runner Controller

The Actions Runner Controller (ARC) is an open-source project for self-hosting GitHub Actions Runners on Kubernetes. It functions as a Kubernetes operator, automating the management of self-hosted runners within a cluster. The operator pattern in Kubernetes is designed to emulate the tasks performed by a human managing a service. Simply put, ARC takes over the tasks that would otherwise require manual intervention for managing a cluster of self-hosted runners.

When we started writing code at Trunk.io, we used GitHub-hosted runners for all our CI jobs. At the time, GitHub runners only had 2 CPU cores and 7GB RAM Linux machines. Building and testing our C++ CLI quickly got slow on these small machines, so we began to self-host machines with beefier specs. Build times dropped precipitously, and everyone at Trunk.io rejoiced. We started with five dedicated VMs, but we needed to add more as the team grew.

This typically happened when someone posted in our #engineering slack channel about their pull requests waiting for available runners, after which I would dust off my “add a new runner” runbook, manually provision another VM, and register the runner with GitHub so it could start taking new jobs. The process would usually take about 30 minutes, not counting the cost of context switching from writing code to operating the runners. This ”operator” role also involved investigating why a runner might happen to be offline. While a simple restart fix was usually sufficient, that would also require another 30 minutes.

ARC aims to automate all of that work. Instead of needing a human to allocate a new VM, wait for it to boot, install the runner software, and register the runner, that happens automatically. The controller can also replace runners that have crashed or had other failures by automatically spinning up a new one.

Prerequisites

For this tutorial, you will need:

Step 1: Provision your Kubernetes Cluster

If you’re following this guide, I’ll assume you aim to use these actions runners in a professional setting. Throughout the guide, I aim to follow best practices to create a maintainable, reproducible runner cluster on AWS EKS using Infrastructure as Code with Terraform. This is not an in-depth guide on setting up EKS using Terraform, but it will get you started.

A quick word of caution: AWS EKS clusters cost ~$0.10 per hour, so you may incur charges by running this tutorial. The cost should be a few dollars at most, but be sure to delete your infrastructure promptly to avoid additional charges.

In your terminal, clone the example repository for this tutorial:

1$ git clone https://github.com/trunk-io/github-actions-runner-tutorial.git && cd github-actions-runner-tutorial

Change into the repository directory:

1$ cd github-actions-runner-tutorial

The Terraform configuration defines a new VPC to provision the cluster and uses the public EKS module to create the required resources, including auto-scaling groups, security groups, and IAM roles and policies.

Open the main.tf file to review the module configuration. The eks_managed_node_groups parameter configures the node groups.

1eks_managed_node_groups = {
2 one = {
3 name = "actions-runner"
4
5 instance_types = ["m5d.xlarge"]
6
7 min_size = 1
8 max_size = 20
9 desired_size = 3
10 }
11 }

The actions-runner group is responsible for operating our actions runners. When ARC identifies the need for more machines, it uses this group's configuration to deploy new nodes. I’ve chosen to use m5d.xlarge instances, which give me more resources than the default GitHub Ubuntu runner, and includes Instance Storage. These instances are ideal for CI workers because they don't require persistent storage, allowing for faster disk access. I've also set the capacity type to SPOT - this gives a cost savings of up to 67% compared to on-demand pricing. Additionally, the max_size value limits the number of nodes Kubernetes will start, providing a lever to control expenses.

Initialize this Terraform Configuration by running the following:

1$ terraform init
2Initializing the backend…
3
4
5
6Terraform has been successfully initialized!
7
8You may now begin working with Terraform. Try running "terraform plan" to see
9any changes that are required for your infrastructure. All Terraform commands
10should now work.
11
12If you ever set or change modules or backend configuration for Terraform,
13rerun this command to reinitialize your working directory. If you forget, other
14commands will detect it and remind you to do so if necessary.

Once Terraform finishes the initialization process, apply the Terraform configuration. Be patient; this can take up to 10 minutes.

1$ terraform apply
2
3...
4
5Apply complete! Resources: 63 added, 0 changed, 0 destroyed.
6
7Outputs:
8
9cluster_endpoint = "https://12345.gr7.us-east-2.eks.amazonaws.com"
10cluster_name = "arc-tutorial-IKQYD53K"
11cluster_security_group_id = "sg-12345"
12region = "us-west-2"

Now that you have provisioned your kubernetes cluster, you need to set up kubectl to interact with it. Note the region and cluster_name outputs from the previous step, you will use them to configure kubectl.

Run this command to retrieve access credentials for your cluster, and configure your kubectl:

1$ aws eks --region $(terraform output -raw region) update-kubeconfig \
2    --name $(terraform output -raw cluster_name)

Verify that kubectl is configured correctly by using it to request information about the cluster:

1$ kubectl cluster-info
2
3Kubernetes control plane is running at https://12345.sk1.us-west-2.eks.amazonaws.com
4CoreDNS is running at https://12345.sk1.us-west-2.eks.amazonaws.com/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy

Congratulations! You have a working Kubernetes cluster, and are ready to install ARC to start provisioning GitHub Actions Runners.

Step 2: Install Cert-Manager

The Helm chart for actions-runner-controller will provision a Certificate using the cert-manager.io/v1 API, so install the cert-manager resources with the following command:

1$ kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.8.2/cert-manager.yaml

Step 3: Authentication

ARC needs to have access to the GitHub API to do its job. ARC supports both token and GitHub app authentication. For the purposes of this guide, we'll use GitHub app authentication, which ensures that your team can continue operations even when you're unavailable.

Follow the instructions here to create a new GitHub app under your organization. Under "Where can this GitHub App be installed?" select Only on this account.

Once you've created a GitHub App, navigate to Private keys in the app settings and Generate a private key. This key will be used to authenticate ARC during its installation. Make sure to note down the app ID, which is also available in the app settings, as you'll need it later.

Step 4: Install the Runner Scale Set Controller

A Runner Scale Set is a group of homogeneous runners that can be assigned jobs from GitHub Actions. Auto-scaling solutions like ARC manage the number of active runners. In this guide, we'll install a Runner Scale Set Controller using Helm, which has the capability of overseeing multiple Runner Scale Sets.

Install the Runner Scale Set Controller chart with this command:

1$ helm install arc --namespace "arc-system" --create-namespace \
2 oci://ghcr.io/actions/actions-runner-controller-charts/gha-runner-scale-set-controller

Verify that your controller is running using the following command:

1$ kubectl get pods -n arc-system
2
3NAME READY STATUS RESTARTS AGE
4arc-gha-rs-controller-7c5554b9f7-jn8jx 1/1 Running 0 13m

Step 5: Configure a Runner Scale Set

After setting up the controller, we must configure a Runner Scale Set for a specific repository. The Runner Scale Set will add a listener pod to monitor GitHub job events and add and remove runners as necessary. To authenticate the listener, install the GitHub app you created in the organization that owns the repository. You'll find the install ID in the URL bar in the format https://github.com/settings/installations/{id}. The GitHub API authentication requires the private key and app ID from step 3 and this install ID.

Choose your helm install name carefully! It defines how jobs will be assigned to runners in this Scale Set. I'll use the name arc-runner-set, which means when I write a workflow and want it to use runners in this Scale Set, I'll set the runs-on key to arc-runner-set.

Create a new Runner Scale Set using this command:

1$ helm install arc-runner-set --namespace arc-runners --create-namespace \
2 --set githubConfigUrl="https://github.com/{organization}/{repository}" \
3 --set=githubConfigSecret.github_app_id="{app_id}" \
4 --set-file=githubConfigSecret.github_app_private_key={path/to/private/key} \
5 --set=githubConfigSecret.github_app_installation_id="{install_id}" \
6 oci://ghcr.io/actions/actions-runner-controller-charts/gha-runner-scale-set

Verify the install:

1$kubectl get pods -n arc-system
2
3NAME READY STATUS RESTARTS AGE
4arc-gha-rs-controller-7c5554b9f7-jn8jx 1/1 Running 0 13m
5arc-runner-set-754b578d-listener 1/1 Running 0 7m22s

The new listener pod authenticates with GitHub to get notifications about new and complete Jobs.

Step 6: Test the Runner Scale Set

To test the new Runner Scale Set, we'll create a simple GitHub Actions Workflow in the repository we specified in Step 5. This workflow can be run manually (using the workflow_dispatch trigger), and specifies arc-runner-set as the runs-on value for the job.

Here is an example workflow file:

1name: Actions Runner Controller Demo
2on:
3 workflow_dispatch:
4
5jobs:
6 Explore-GitHub-Actions:
7 runs-on: arc-runner-set
8 steps:
9 - run: echo "🎉 This job uses runner scale set runners!"

Once you trigger the workflow, you can see a new pod created to run the job:

1$ kubectl -n arc-runners get pod
2
3NAME READY STATUS RESTARTS AGE
4arc-runner-set-s4c9s-runner-7lwst 1/1 Running 0 4s

and you can see the same pod name in the logs for the Job:

1Runner name: 'arc-runner-set-s4c9s-runner-7lwst'

Maintenance and Monitoring

Getting an autoscaling cluster of self-hosted runners is a big achievement - congratulations! But getting the compute in place is just one part of administering a good CI system. Speed, reliability, and CI cost are constant concerns at engineering organizations, regardless of size. CI is a crucial part of an engineering team - when CI gets slow or unreliable, the engineering team grinds to a halt. To keep your team's velocity high, keep your CI healthy so it doesn't become a productivity bottleneck. To do this, you'll need a way to track metrics about job wait time - the amount of time a job waits to be scheduled to a runner. This will help you tune your autoscaling cluster to ensure it can handle the load you need.

We built Trunk.io CI Analytics to help you monitor and optimize your CI workflows. Receive real-time alerts when a pull request introduces changes that slow down your test workflows. Easily investigate and resolve flakiness by accessing before-and-after metrics that validate your fixes. Gain valuable insights into resource usage, that can help you cut costs without compromising test throughput. Best of all, enabling CI Analytics takes only a single click.

Try it yourself or
request a demo

Get started for free

Try it yourself or
Request a Demo

Free for first 5 users