CUE.works infrastructure
This repo contains and orchestrates a system which manages most CUE-owned GitHub Organisations.
The system uses GitHub Actions ("GHA") to coordinate Terraform invocations that
manage the state of the GitHub orgs' entities such as org members, repo
settings, outside collaborators
and more.
It can be easily extended to manage any GitHub entity type supported by
the github Terraform provider
and potentially any entity type supported by
any Terraform provider.
CUE is the system's source of truth, which is primarily stored in the config
package.
There are Quickstart guides available for some common tasks:
- Creating a new repo
- Deleting a repo
- Onboarding a new employee
- Granting an outside collaborator access to specific repos
- Adding an org to the system
There is a basic guide to customising this repo and setting up your GitHub Actions environment.
There are additional Quickstart guides for the following tasks, but these processes are untested and should only be performed by experienced users of this system, and with an eye on validating the process and promoting its entry to the list above:
The GitHub org that this repo belongs to is deliberately left unmanaged by the system, so that we can never accidentally cause the system to lock ourselves out of this repo, unable to undo the change.
The system manages other orgs via a GitHub machine user account. That account is an owner of any org it manages. We have chosen not to manage that machine user's membership in any org via this system - again, so that we can never inadvertently remove the machine user from any org and lock ourselves out of that org.
Operations
Overview
This repo is a CUE module whose config package defines the configuration of
all GitHub orgs managed by the system. All CUE files in the config/ directory
contribute to the package. The files' purposes and resulting CUE
structure are documented in this README.
CUE is the source of truth both for the GitHub orgs being managed by Terraform and the GHA jobs that automate the management. However neither Terraform nor GHA accept CUE as input, so we serialise their respective config files before they execute. We store Terraform's config files in the repo (instead of exporting them at GHA runtime) so we can have confidence in their content and presence, and to ensure they are reviewable as part of any change.
We shield each GitHub org from all the others (and shield the operator from making high-impact errors) by managing each org via its own Terraform config file and Terraform invocation, and by storing each org's state file in a dedicated Terraform Cloud ("TFC") Workspace. TFC is used solely to store Terraform state files and to provide a locking mechanism preventing parallel Terraform invocations from interfering with each other. It is not used for its ability to invoke Terraform.
Changes are made to the repo's contents via GitHub Pull Requests ("PR"s) from
feature branches onto the main branch. Only commits to the main branch
trigger a terraform apply invocation, and that branch is protected from
direct pushes. Commits must be added via PRs.
Terraform invocations run inside GHA, with each org having a dedicated
terraform plan job during PR testing, and a dedicated terraform apply job
after each PR merge.
Terraform's generated config files live inside _operations/github/ in a
directory named after the org being managed. This per-org directory is
Terraform's working directory, and is where Terraform initialises its provider
plugins before invocations.
Changes to files inside _operations/ are made by Makefile recipes, CUE
_tool scripts, and Terraform's lockfile management - all of which are
triggered by the operator and never by CI itself.
The contents of files inside _operations/ are controlled via the config
package. Changes inside _operations/, config/ and .github/ must be
committed by the operator. No manual edits of files inside _operations/
are needed, and such edits are guarded against by an early CI assertion that
the committed contents of config/ produces exactly the committed contents of
_operations/.
Making A Content Change In One Or More Orgs
After making a change in config/ that updates one or more org's resources or
introduces new resources, you must:
- test the changes with
make test-config - generate static Terraform configuration with
make generate - submit the changes as a PR
- validate the per-org
terraform planoutput posted to your PR as comments - have the PR reviewed and approved
- merge the PR
Test The Changes
With a version of CUE installed that understands required fields, run
make test-config
Generate Static Terraform Configuration
Run
make generate
Submit The Changes
On a non-main branch, commit all changed files, including but not limited to
all changes under the config/, _operations/, and .github/ directories.
Add all changes in a single commit. Use a commit prefix such as:
- changes to a single org:
org/cue-foo: add the X field to Y - changes across multiple orgs:
org/cue-{foo,bar}: make Y do X - changes across all orgs:
org/*: change all X to Y- e.g. global template changes
If your commit doesn't update, create, or destroy any Terraform resources (e.g. it's a CUE refactor, a documentation update, or a whitespace change), inform the CI tests that they should assert that no-op-ness by including this string on a separate line in your commit message body:
TERRAFORM-PLAN-NO-OP-REQUIRED
Open a draft PR that would add your changes to the main branch in this repo.
Validate The Per-Org terraform plan
The CI test jobs will finish by running terraform plan once for each managed
org. Each terraform plan's output is posted to your PR as a separate comment.
Check that Terraform is proposing to create, update or destroy the resources that you expect.
If your PR passes the GitHub Actions CI tests for every org, mark the PR as ready and request a review from someone on the team.
Have The PR Reviewed And Approved
This repo has a CODEOWNERS file which enforces that certain
files' changes must be reviewed by specific people. In general, PRs may be
reviewed by anyone with access to the repo. @myitcv has the most information
about the system, and in an emergency you can reach out to @jpluscplusm for
support.
The main branch's protection rules require an approval on a PR's latest
commit before that PR's commit(s) can be added to the branch.
Merge The PR
After the PR has been approved ensure that any other PR merges have fully run to completion in the repo's GitHub Actions.
Merge the PR yourself, using the "Rebase and merge" strategy.
Make Targets
There is a single Makefile at the top of the repo. This Makefile is used by
developers, operators, and CI to perform specific tasks. The Makefile is
written in the "phony" task runner model, not the traditional make a.specific.file.exist model - all tasks run unconditionally, without checking
any file modification times. The Makefile includes other Makefiles (in
/ci/make/) which contain the messy implementation details of how CI jobs turn
terraform plan output into nice PR comments.
Running make [help] lists the runnable targets alongside a brief description
of what each target does. These targets are listed here with slightly more
detail than at the CLI.
Parameters should be provided as Makevars (make <target> PARAM=value).
Providing them as envvars may work, but has not been tested.
| Target | Parameters | Purpose |
|---|---|---|
test-config |
cue vet the config package |
|
clean |
Remove all .gitignored files | |
generate |
Recreate all generated files in the repo | |
trim |
Run cue-trim on the non-GitHub-Actions portions of the unified config (because something about our workflows confuses trim, which has been filed as an upstream bug) |
|
lockfile_upgrade |
Upgrade Terraform providers to the latest versions permitted by config/manifest.cue |
|
lockfile_hash |
Place a full set of platform-specific hashes in terraform's lock file, without re-locking versions | |
check_clean_working_tree |
Assert that all git's tracked files are unchanged from their latest committed state, and no untracked files exist | |
help |
Show abbreviated help text |
Upgrading Terraform Providers
Click to open this paragraph if you have prior experience of upgrading Terraform providers in a different system
In vanilla Terraform setups you would run terraform init with its -upgrade
flag in a directory that contains Terraform's config.
Do NOT do that here!
Don't run terraform init -upgrade in any specific org's _operations/*
directory.
Doing so would desync that org from all the other orgs (in terms of provider versioning) and would require unpicking symlinks (cf. https://github.com/hashicorp/terraform/issues/32707)
Instead, read on for details of how to use Makefile targets to upgrade Terraform providers in this system.
The versions of providers we use are constrained by
Terraform's constraint syntax
in the version manifest at config/manifest.cue, under the CUE path
versions: terraform: providers: [_]:
config/manifest.cue is under your control, but the provider versions that
will be used are not decided by this file's contents on every Terraform
invocation.
Instead, the versions are locked at the specific versions
visible inside _operations/github/.terraform_lockfile/.terraform.lock.hcl.
This file is static until an operator chooses to upgrade and lock provider versions.
_operations/github/.terraform_lockfile/.terraform.lock.hcl is managed by
Terraform, and is modified by the following Make targets (which invoke
Terraform, which must be installed):
make lockfile_upgrade: upgrades to the latest versions permitted by the constraints inconfig/manifest.cuemake lockfile_hash: records a hash of each currently selected provider's installation files, without upgrading versions
Runnng make lockfile_upgrade also invokes lockfile_hash. Neither target
requires TFC or GitHub API tokens to be available on the local machine, and can
be run by anyone with an appropriate version of Terraform installed.
Both of these commands might modify
_operations/github/.terraform_lockfile/.terraform.lock.hcl.
Changes to this file must be committed and PR'd.
It is STRONGLY recommended to PR & merge provider version upgrades ahead of configuration changes which require the upgraded versions so that the version upgrade can be verified as being a no-op change in isolation.
Running Terraform Locally
This system is intended to run Terraform solely in CI, and not on an operator's machine. The initial setup required for CI is documented separately.
If you do need to run Terraform locally then the system requires 2 environment variables to be set, containing credentials for its 2 different backend systems: Terraform Cloud ("TFC") and GitHub.
However, before doing this, consider if you really need to run Terraform locally, or if you could instead achieve what you need via CI. The system currently does not have a concept of "read-only" access to either TFC or GitHub, so possessing the credentials required for Terraform invocation means your local machine is a signficantly elevated risk for the organisation.
If you make the choice to run Terraform locally, protect the credentials:
- minimise the duration that the credentials are present on your local machine
- remove them from your environment after each session in which you use them
- don't make shadow copies of them in your local secret-management software
- don't store them on your filesystem for any period of time
- don't commit them to this repo!
Terraform Cloud
Terraform Cloud ("TFC") is a managed service hosted by Hashicorp. We use it to store our Terraform state files, and to provide locking primitives for access to those state files.
Expose a TFC API token via the environment variable
TF_TOKEN_app_terraform_io.
In mid-2023, these tokens look like
<random-characters>.atlasv1.<random-characters>.
GitHub API
The GitHub resources resources managed by the system are accessed via the Github API.
Expose an appropriately-permissioned GitHub API token via the environment
variable TF_VAR_provider_github_token.
In mid-2023, these tokens
look like
ghp_<random-characters>.
Managing Existing GitHub Resources
Importing existing GitHub resources so they can be managed by Terraform is a fiddly and potentially risky operation.
It is documented separately.
Layout
This Repo
-
ci/: CI related files -
CODEOWNERS: A file configuring a GitHub feature that ensures that changes to certain files are reviewed by specific people -
config/: user-modifiable configuration filesorg.*.cue: per-org resources and org-level deviations from global defaults
-
generate_tool.cue: a CUE script which writes the systems' generated files -
.github/workflows/: Generated GHA workflow files -
internal/: CUE files which don't have a place as part of theconfigpackage, but haven't yet been externalised from this repo-
schemata/providers/github/: manually-generated schemata which validate our configuration of resources managed by thegithubTerraform provider -
schemata/terraform/: manually-generated, minimally-viable schemata for various provider-agnostic components of Terraform's input, including top-level config and resource meta arguments
-
-
Makefile: the dev- and CI-facing task runner. See Make Targets for a list of its targets and the tasks it performs -
_operations/: the root directory for all generated files (except GitHub Actions workflow files, which must live in.github/workflowsand cannot be symlinks)-
github/cue-*/: a per-GitHub-org working directory for Terraform invocations, each containing the generatedconfig.tf.jsonfile for the GitHub org the directory is named after -
github/.terraform_lockfile/: the working directory used bymake lockfile_upgradeoperations, containing only those parts of our Terraform configuration which affect provider dependency version selection.terraform.lock.hcl: The Terraform-controlled lockfile which contains the exact versions of providers the system uses. Controlled by a Make target, and documented elsewhere in this README
-
The Main CUE Package
The high-level shape of the config package is described in
schema.cue.
Each github.org.[OrgName=string]: {} struct represents a managed GitHub org
named OrgName.
Inside this struct the org's config lives in the config field, whose shape is
also described in schema.cue, in the #terraform_input
definition.
Each github.actions.workflow.[Filename=string]: {} struct that exists results
in a separate GHA workflow file being written into
.github/workflows/<Filename>.yml
Each target.terraform.github.org.[OrgName=string]: {} struct is a dynamic
copy of the same path without the target.terraform prefix. A copy exists
for each managed org.
This struct contains the content that ultimately gets serialised into
_operations/github/<OrgName>/config.tf.json, but with one difference:
resource names are translated in order to meet a specific set of Terraform
requirements, which are different from CUE's struct name constraints. The
translation rules live inside target.terraform.#Identifier.adapt, defined in
config/target.terraform.cue along with their documentation.
Quickstart Guides
After following any quickstart section, also follow the content change section to submit and apply your changes.
Creating A New Repo
Prerequisites
- You want to create a repo with name
<REPO-NAME>in org<ORG-NAME> - You know if it will be a public or private repo
Configuration
Configure the new repo in config/org.<ORG-NAME>.cue, with the following path
and minimum concrete configuration:
github: org: <ORG-NAME>: config: {
resource: {
github_repository: {
<REPO-NAME>: {
visibility: "public" | "private"
}
}
}
}
Other options are available, as per the schema in
internal/schemata/providers/github/repository.cue. Global defaults,
if any, are visible in config/defaults.cue.
Now read the content change section.
Deleting A Repo
Prerequisites
- A public or private repo exists that you wish to delete entirely
- The repo is managed by this system
- The repo is named
<REPO-NAME>, and it exists in the<ORG-NAME>org
Configuration
In the config package, in the file dedicated to <ORG-NAME>'s resources,
find the struct
github: org: <ORG-NAME>: config: resource
Inside that struct, find and delete the github_repository: <REPO-NAME>
struct.
Alow find and delete any references to the repo in other structs such as:
_non_org_member_access: collaborator_non_org_member_access: botgithub_repository_collaborators
You now need to PR and merge these change, but the CI job after the merge onto the main branch will fail. This is expected, and is a safety precaution.
After reading and following the content change section, return to this guide and continue from this point.
Your PR passed all its tests, you merged it, and the terraform apply job on
the main branch failed. This is because this system has not been granted
permission to delete any repos, so it can't accidentally destroy content
irreversibly.
Find someone who has control of an org-owner account in the <ORG-NAME> org.
Ask them to navigate to the repo's settings page and delete the repo manually.
After they have done this, re-trigger the CI job on the main branch and
observe your commit's status going green.
Onboarding A New Employee
Prerequisites
- You want to grant a new company employee read and write access to all public and private repos across all orgs
- You know their GitHub login
Configuration
Add the new employee to config/employees.cue as an #Employee struct inside
company.employees.
Now read the content change section.
Granting An Outside Collaborator Access To Specific Repos
Prerequisites
- You want to grant a 3rd party read, write, or triage access to a specific public or private repo
- You know their GitHub login is
<USER-NAME> - You know the repo name is
<REPO-NAME>and it already exists inside org<ORG-NAME>
Configuration
Configure the access in config/org.<ORG-NAME>.cue with the following path and
minimum concrete configuration:
github: org: <ORG-NAME>: config: {
resource: {
_non_org_member_access: {
collaborator: {
<USER-NAME>: {
<REPO-NAME>: "pull" | "push" | "triage"
}
}
}
}
}
Now read the content change section.
Adding An Org To The System
Prerequisites
- You want to give the system the ability to create GitHub entities inside a
GitHub org called
<ORG-NAME> - The GitHub org already exists
- or you can create it via their web UI
- You want the org to adopt our
org-level default settings
- or you know which of the org's settings deviate from our defaults, and what their values are
- The machine user account
cue-terraform-github-config-experiment-controllerhas been manually invited into the org as an owner- ... and the machine user has logged in and accepted the invitation via the GitHub web UI
- You have created a dedicated "CLI-driven workflow" workspace in Terraform
Cloud for the org
via their web UI
- The workflow is the "CLI-driven" flavour
- The workspace is named
org_<ORG-NAME> - You have tagged it, after creation, via the workspace's main page, with
these tags:
service:githuborg:<ORG-NAME>
- You have switched the workspace's "Execution Mode" to "Local" via the workspace's Settings page
Configuration
Create the file config/org.<ORG-NAME>.cue with the following minimum content:
package config
github: org: <ORG-NAME>: {}
If there are any
org-level settings
that deviate from our defaults in config/defaults.cue, place them at this
path:
github: org: <ORG-NAME>: config: {
resource: {
github_organization_settings: self: {
...
}
}
}
The controlling schema is at
internal/schemata/providers/github/organization_settings.cue
Adopting this config will change the existing org-level settings to our
defaults (or your overrides) without showing you the existing values in a
PR's terraform plan output.
Now read the content change section.
UNTESTED Quickstart Guides
These guides have never been tested, and are purely indicative of the process that might be followed. If you complete any of the following, please validate its process carefully and consider promoting it to the main Quickstart section if it's robust enough.
Renaming A Repo
NB THIS PROCESS HAS NOT BEEN TESTED. So long as it sits in this section of the documentation ("UNTESTED Quickstart Guides") please exercise extreme caution when following this process, and add any updates/changes/better-words that you feel would help the next reader.
Prerequisites
- a repo named
<REPO-1>exists in the org<ORG-NAME> - a repo named
<REPO-2>does not exist in that same org - you want to rename
<REPO-1>to<REPO-2>, inside the same org - you accept that the outside collaborators with access to
<REPO-1>will get re-invited to collaborate on the repo after the renaming
Configuration
In the config package, in the file dedicated to <ORG-NAME>'s resources,
find the struct
github: org: <ORG-NAME>: config: resource
Inside that struct, find the github_repository: <REPO-1> struct.
Change the path element <REPO-1> to <REPO-2>.
Also find and change any references to the repo's name in other structs such as:
_non_org_member_access: collaborator_non_org_member_access: bot
The github_repository_collaborators struct cannot be updated in place, so a
destroy/create cycle will be performed on that resource. This will re-invite
any collaborator or bot accounts to collaborate on <REPO-2>, and will require
them to accept the invitation before access is re-granted.
Add a new struct:
github: org: <ORG-NAME>: config: moved
moved is an ordered list of { from: string, to: string } tuples that
enables Terraform to track identifier renames over time. You may well be
establishing the first such element of the struct, and the struct
itself, as we haven't used this Terraform feature previously.
from and to must contain the Terraform-visible identifiers of the old
and renamed repo, respectively. This means that they must reflect the name changes
that are performed by target.terraform.#Identifier.adapt{} - which, among
other things, changes periods into underscores.
Both from and to must be strings, not CUE-resolved references. Neither
should refer to a resource's CUE #tfref convenience field (that normally
contains the Terraform-visible identifier for a resource, as a string) because
at least one of the before- and after-the-renaming CUE structs will be missing
from your config.
Now read the content change section, paying very close attention to the mid-PR Terraform plan output, telling you what will be created, updated in place, update with a destroy/recreate, and destroyed entirely. Be very, very sure that you understand what Terraform is proposing to do.
Moving A Repo Across Orgs
NB THIS PROCESS HAS NOT BEEN TESTED and is only sketched out as an indicator for an experienced user of this system to use as a starting point. So long as it sits in this section of the documentation ("UNTESTED Quickstart Guides") please exercise extreme caution when adapting this process, and add any updates/changes/better-words that you feel would help the next reader.
This process is a lightweight concept for how transferring a repo across orgs
might work. It will require admin-level involvement halfway through, as it
leans on the UI-based "transfer a repo" feature that's only available to repo
admins. This is because there doesn't appear to be support for cross-org
transfers in
the github Terraform provider.
- Move the config defining the repo between the 2 orgs'
config/org.*.cuefiles andresourcestructs - Move (or delete) the config of any resources that are dependent on the repo
(such as access granted via
_non_org_member_accessbetween the 2 orgs'resourcestructs - Raise a PR containing these changes in line with the content change section, but do not merge the PR
- Observe that the
terraform planoutput posted to the PR as comments shows that the repo will be deleted from one org and created in the other. Again, do not merge the PR - Ensure that all your colleagues are aware that from this point onwards,
until further notice, you must have an exclusive lock on the repo, and
they must not open, sync, or merge any PRs, or otherwise cause a
terraforminvocation to ocurr - Use a GitHub account (probably an org-owner, or one that has, at minimum, repo-admin permissions on the repo in question and the ability to create repos in the receiving org) to perform the transfer via the repo's settings UI
- Re-run the PR's checks. Observe that the donating org's plan no longer tries to delete the repo (though it will be proposing deletion of various linked resources). Observe that the receiving org's plan still shows it is going to try and create the repo. As above, do not merge the PR
- Follow the relevant parts of the separate doc on managing existing github
resources, making adjustments where
neccessary when your situation differs from that assumed by the document
- in particular, you must perform all local
terraforminvocations whilst your PR'd branch is checked out, and your litmus test for success is not "the PR shows a no-op change". You won't raise a second PR, and you won't mark your PR as a no-op withTERRAFORM-PLAN-NO-OP-REQUIRED(because the donating org will have some resources deleted) - all the warnings in that doc about the risks involved still hold
- all the warnings in that doc about the "critical region" still hold
- in particular, you must perform all local
- When you have performed the import, re-run your PR's checks. Observe that:
- the donating org will still have some deletions. Understand and agree with each
- the receving org will have some creations
- but critically not the
github_repositoryresource - understand and agree with each
- but critically not the
- If the above point is not 100% the case, reach out for support immediately.
- Merge the PR, and verify that the changes performed after the merge match those proposed in the PR comments
- Advise your colleagues that the critical region has finished.