Hello, world

In my last way-too-long, way-too-technical, seriously-nobody-cares technical post, I wrote about serverless functions. The main benefit of serverless functions, I wrote, is that you can deploy code to production without having to worry about keeping a server online, secure, and up-to-date. But a secondary benefit of serverless function is also its main trade-off: they’re just functions. Computer scientists might call them pure functions, because the outputs of serverless functions are usually entirely dependent on their inputs and nothing else. You could also call them stateless, because they don’t retain any artifacts or side effects from any one invocation. (The runtimes from AWS and Google fudge this somewhat, but let’s pretend.) This trade-off makes the code simpler to understand and to debug.

For many cases, as it was for Louvre, the trade of simplicity for state is well worth it. But other times, it’s worth it to have a more stateful system. An API might want to store database connections for reusability, or maintain in-memory caches for speed, or simply maintain a counter for the purpose of rate-limiting. And that’s where AWS Fargate comes in.

Fargate is sort of the best of both worlds. Like its predecessor, it’s a way of launching containers on AWS while maintaining visibility on the container after it launches. But unlike its predecessor, the EC2 launch type, Fargate doesn’t require you to pre-allocate and maintain an instance on which to run your container. With Fargate, you simply get to define your container and launch it.

Or at least that’s the promise. AWS, however, is complicated, and launching a Fargate service using the console is no mean feat. You have to use at least five different AWS services, in a specific order, and that’s not including any databases or other integrations you might want to use. The console never really tells you where to start. It never tells you where to go next. Sometimes the information the console gives to you is just plain wrong. If you mess up, you might be able to fix it. But if not, you might have to start from scratch.

That’s why infrastructure configuration languages like Terraform are so appealing. You simply define your infrastructure once, in code. Then you run a program which uses that configuration to build your infrastructure. If you mess up, or want to try something new, you can simply blow it all up and rest assured that it’s just as easy to recreate it. Best of all, all infrastructure changes can now be peer-reviewed and committed to version control, a requirement in highly-regulated environments and a plus everywhere else.

But the promise of Terraform is a little too good to be true, and that’s because Terraform has to play by the rules of your cloud provider. Terraform will build whatever infrastructure you tell it to, but you still have to know what you want. With AWS, and newer services like Fargate in particular, this isn’t always clear. So while you’ll see a lot of Terraform in this tutorial, this is really a tutorial on how to set up a Fargate service.

Here’s the goal: we’re going to try to spin up a Fargate service, using Terraform and as minimal a configuration as we can get away with. I’ll show the code step by step below, but at the end of this article I’ll provide a link to a Github repository with all of the Terraform necessary to start a Fargate service, with a few minimal changes.

(If you’ve read this far and find yourself wanting to run from the room screaming, thanks for sticking with me for this long! I’ll write about something more interesting next time, I promise.)

The setup

Let’s start with the app. The core building blocks of Fargate services are Docker containers and the whole point of Docker, or containerization in general, is that the host operating system no longer has to care about what sort of app is in the container (and vice versa!). So as a demo, I wrote a quick Go app (natch), but it could easily be a Node app, a Rails app or even just a webserver serving static files (don’t actually do this last one — there are better ways to solve that particular problem).

Here’s our application:

  1// main.go
  2package main
  3
  4import (
  5	"context"
  6	"encoding/json"
  7	"fmt"
  8	"log"
  9	"net/http"
 10	"time"
 11
 12	"github.com/go-chi/chi"
 13)
 14
 15func main() {
 16	var err error
 17	time.Local, err = time.LoadLocation("America/New_York")
 18	if err != nil {
 19		panic("timezone not loaded!")
 20	}
 21
 22	mux := chi.NewRouter()
 23	mux.Get("/health", health)
 24	mux.Get("/", handler)
 25
 26	log.Println("listening on :3000")
 27	http.ListenAndServe(":3000", mux)
 28}
 29
 30func health(w http.ResponseWriter, _ *http.Request) {
 31	w.WriteHeader(http.StatusOK)
 32	w.Write([]byte("ok " + time.Now().Format(time.RFC3339)))
 33}
 34
 35func handler(w http.ResponseWriter, r *http.Request) {
 36	lat := r.URL.Query().Get("lat")
 37	if lat == "" {
 38		lat = "41.495833"
 39	}
 40
 41	lng := r.URL.Query().Get("lng")
 42	if lng == "" {
 43		lng = "-81.685278"
 44	}
 45
 46	date, _ := time.Parse(time.RFC3339, r.URL.Query().Get("date"))
 47	if date.IsZero() {
 48		date = time.Now()
 49	}
 50
 51	u := fmt.Sprintf(
 52		"https://api.sunrise-sunset.org/json?lat=%s&lng=%s&date=%s&formatted=0",
 53		lat,
 54		lng,
 55		date.Format("2006-01-02"),
 56	)
 57
 58	log.Println("sending request to", u)
 59
 60	ctx, cancel := context.WithTimeout(r.Context(), 5*time.Second)
 61	defer cancel()
 62
 63	req, err := http.NewRequestWithContext(ctx, http.MethodGet, u, nil)
 64	if err != nil {
 65		w.WriteHeader(http.StatusInternalServerError)
 66		w.Write([]byte("couldn't make http request"))
 67		return
 68	}
 69
 70	resp, err := http.DefaultClient.Do(req)
 71	if err != nil {
 72		w.WriteHeader(http.StatusInternalServerError)
 73		w.Write([]byte(fmt.Errorf("http: do: %w", err).Error()))
 74		return
 75	}
 76
 77	var target struct {
 78		Status  string `json:"status"`
 79		Results struct {
 80			Sunrise   time.Time `json:"sunrise"`
 81			Sunset    time.Time `json:"sunset"`
 82			SolarNoon time.Time `json:"solar_noon"`
 83		} `json:"results"`
 84	}
 85
 86	if err := json.NewDecoder(resp.Body).Decode(&target); err != nil {
 87		w.WriteHeader(http.StatusInternalServerError)
 88		w.Write([]byte("couldn't decode json"))
 89		return
 90	}
 91
 92	resp.Body.Close()
 93
 94	out := struct {
 95		OK        bool   `json:"ok"`
 96		Date      string `json:"date"`
 97		Sunrise   string `json:"sunrise"`
 98		Sunset    string `json:"sunset"`
 99		SolarNoon string `json:"solar_noon"`
100	}{
101		OK:        true,
102		Date:      date.Format("2006-01-02"),
103		Sunrise:   target.Results.Sunrise.In(time.Local).Format("3:04 PM"),
104		Sunset:    target.Results.Sunset.In(time.Local).Format("3:04 PM"),
105		SolarNoon: target.Results.SolarNoon.In(time.Local).Format("3:04 PM"),
106	}
107
108	w.Header().Set("Content-Type", "application/json; charset=utf-8")
109	w.WriteHeader(http.StatusOK)
110	json.NewEncoder(w).Encode(out)
111}

Hopefully, it’s pretty straightforward. We’re creating an HTTP server and exposing two endpoints on it. One of them is just a health check; the other queries an API for the sunrise and sunset times for a particular location on a particular day. The whole app is less than 100 lines of code, but it’s doing two things that we’d want a typical API to do: listen for requests and act as a gateway to make requests to an upstream service.

Next, in order to deploy it on Fargate, we need to define the Docker container — or Dockerize — our app. Here’s the Dockerfile which makes that happen:

 1# Dockerfile
 2FROM golang:1.17 AS builder
 3WORKDIR /app
 4COPY main.go go.mod go.sum ./
 5RUN CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build -a -installsuffix cgo -o app .
 6
 7FROM alpine:latest
 8
 9RUN apk update \
10	&& apk add ca-certificates tzdata \
11	&& update-ca-certificates \
12	&& apk add shadow \
13	&& groupadd -r app \
14	&& useradd -r -g app -s /sbin/nologin -c "Docker image user" app
15
16USER app
17WORKDIR /app
18
19COPY --from=builder /app/app ./app
20EXPOSE 3000
21CMD ["./app"]

We’re using two stages in this Dockerfile. The first stage, which starts on line 1, builds the application. The second stage, starting on line 7, copies the built application into a slimmer and less permissive environment.

To make things a little easier, I took the liberty of building the Docker image and pushing it to a public Docker repository on Github. This saves us the couple of steps required to create a private repository and push an image to it. The service we’re about to create can just use the image from the public repository.

I promise this isn’t some half-witted attempt to get you to install malicious code in a Docker container in your AWS account. But if you’d prefer to be cautious, you can create the image yourself and upload it to any Docker repository you control. (For simplicity, make sure it’s public for now.) Here’s an example of how to do it on Docker Hub:

Create a public repository called sun-api on Docker Hub.
Make sure you’re logged into your Docker account on the CLI by running docker login.
Grab the two files above and put them in a directory together (not your $GOPATH). Run go mod init; this should create a go.mod file and a go.sum file.
Run docker build -t <your_docker_username>/sun-api:latest ..
Run docker push <your_docker_username>/sun-api:latest.

Keep an eye out for when we use this image name later on in the tutorial and replace my image’s URL in the image path with <your_docker_username>/sun-api:latest.

(By the way, creating a private ECR repo to push Docker images to and making your service pull from that repo isn’t hard. The part that’s a bit of a pain is actually pushing your image from your machine to the ECR repo. So I’m opting to skip it. But I’ll give you the Terraform for creating the ECR repo as well, if you want to do that in the future.)

Now that our app is ready to deploy, let’s start writing some Terraform. Add these lines to a file named config.tf in your directory:

 1# config.tf
 2provider "aws" {
 3  region  = "us-east-1"
 4  profile = "tfuser"
 5}
 6
 7terraform {
 8  required_version = ">= 1.0"
 9
10  backend "s3" {
11    bucket  = "terraform"
12    key     = "terraform.tfstate"
13    region  = "us-east-1"
14    profile = "tfuser"
15  }
16
17  required_providers {
18    aws = {
19      source  = "hashicorp/aws"
20      version = "~> 3.69.0"
21    }
22  }
23}

Both of the blocks in this file contain a line that says profile = "tfuser". This tells Terraform how to authenticate with your AWS account. You’ll need to set this up manually: under IAM in the AWS Console, select Users in the left hand nav, then find the Add User button (or just click here). The username should be tfuser, and make sure the checkbox labeled “programmatic access” is checked. On the next screen, make sure to add the AdministratorAccess policy.

After creating your user, you should see a screen with an Access Key ID and Secret Access Key. Copy those values into a file named ~/.aws/credentials, with the following format:

1[tfuser]
2aws_access_key_id = <YOUR_ACCESS_KEY_ID>
3aws_secret_access_key = <YOUR_SECRET_ACCESS_KEY>

Terraform reads every file ending in .tf in the same directory as part of the same workspace, so we can split up our code into meaningful files. We combined most of our config into one file, but if things ever get more complicated, we can split out this config into a provider.tf, backend.tf and versions.tf, for example.

Our backend block under terraform is telling AWS we’re going to put the state file in an S3 bucket called terraform with a filenamed called terraform.tfstate. You’ll probably need to change the bucket name to something more unique; since S3 bucket names are unique per region, there’s a really good chance someone is using the name terraform. Create a bucket with the name you picked in the S3 console (the default, totally private settings are what you want), then set that as your bucket name in backend.tf.

Next, from the command line and in the same directory as your config.tf file, run terraform init. You should see a success message that looks something like this:

 1$ terraform init
 2
 3Initializing the backend...
 4
 5Successfully configured the backend "s3"! Terraform will automatically
 6use this backend unless the backend configuration changes.
 7
 8Initializing provider plugins...
 9- Reusing previous version of hashicorp/aws from the dependency lock file
10- Installing hashicorp/aws v3.69.0...
11- Installed hashicorp/aws v3.69.0 (signed by HashiCorp)
12
13Terraform has been successfully initialized!
14
15You may now begin working with Terraform. Try running "terraform plan" to see
16any changes that are required for your infrastructure. All Terraform commands
17should now work.
18
19If you ever set or change modules or backend configuration for Terraform,
20rerun this command to reinitialize your working directory. If you forget, other
21commands will detect it and remind you to do so if necessary.

If that’s what you see, great! If not, make sure your tfuser user has the appropriate AWS permissions and verify that Terraform is installed correctly on your machine (if you run terraform version, you should see something along the lines of Terraform v1.0.5).

You may have also noticed that Terraform created a file called .terraform.lock.hcl in your directory. If you’re using version control, this file is like a package-lock.json or go.sum and is safe to commit.

The service

Our end goal is to create a Fargate ECS service. So let’s start by creating that and see where we get. From the Terraform documentation, it seems like we want to create an aws_ecs_service. Let’s add an aws_ecs_service resource block, with the required fields filled out as well as we can. (Paste this into a new file called ecs.tf.)

1# ecs.tf
2resource "aws_ecs_service" "sun_api" {
3  name            = "sun-api"
4  task_definition = ""
5}

It turns out there are only two absolutely required fields, so our first iteration is pretty simple. We’re supplying the name of the service, which is arbitrary. We don’t know what the task_definition is yet, so we’ll just use an empty string for now. This is obviously not going to be our final solution, but let’s run a plan and see where we are.

 1$ terraform plan
 2
 3Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
 4  + create
 5
 6Terraform will perform the following actions:
 7
 8  # aws_ecs_service.sun_api will be created
 9  + resource "aws_ecs_service" "sun_api" {
10      + cluster                            = (known after apply)
11      + deployment_maximum_percent         = 200
12      + deployment_minimum_healthy_percent = 100
13      + enable_ecs_managed_tags            = false
14      + enable_execute_command             = false
15      + iam_role                           = (known after apply)
16      + id                                 = (known after apply)
17      + launch_type                        = (known after apply)
18      + name                               = "sun-api"
19      + platform_version                   = (known after apply)
20      + scheduling_strategy                = "REPLICA"
21      + tags_all                           = (known after apply)
22      + wait_for_steady_state              = false
23    }
24
25Plan: 1 to add, 0 to change, 0 to destroy.

This is just a plan, meaning we haven’t actually made any changes to our AWS environment yet. But it’s always a good idea to inspect the plan output to make sure Terraform is doing what we expect it to do. In this case, the only thing that seems off is the launch_type. Terraform is saying it will be “known after apply,” which means it’ll use whatever AWS defaults to. We want to ensure it’s FARGATE, so let’s add that line:

1resource "aws_ecs_service" "sun_api" {
2  name            = "sun-api"
3  task_definition = ""
4+ launch_type     = "FARGATE"
5}

And here’s the resulting output:

 1$ terraform plan
 2
 3Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
 4  + create
 5
 6Terraform will perform the following actions:
 7
 8  # aws_ecs_service.sun_api will be created
 9  + resource "aws_ecs_service" "sun_api" {
10      + cluster                            = (known after apply)
11      + deployment_maximum_percent         = 200
12      + deployment_minimum_healthy_percent = 100
13      + enable_ecs_managed_tags            = false
14      + enable_execute_command             = false
15      + iam_role                           = (known after apply)
16      + id                                 = (known after apply)
17      + launch_type                        = "FARGATE"
18      + name                               = "sun-api"
19      + platform_version                   = (known after apply)
20      + scheduling_strategy                = "REPLICA"
21      + tags_all                           = (known after apply)
22      + wait_for_steady_state              = false
23    }
24
25Plan: 1 to add, 0 to change, 0 to destroy.

This seems too easy, but let’s run an apply anyway, just to see what happens:

 1$ terraform apply
 2...
 3Do you want to perform these actions?
 4  Terraform will perform the actions described above.
 5  Only 'yes' will be accepted to approve.
 6
 7  Enter a value: yes
 8
 9aws_ecs_service.sun_api: Creating...
10╷
11│ Error: error creating sun-api service: error waiting for ECS service (sun-api) creation: InvalidParameterException: TaskDefinition can not be blank.
12│
13│   with aws_ecs_service.sun_api,
14│   on ecs.tf line 1, in resource "aws_ecs_service" "sun_api":
15│    1: resource "aws_ecs_service" "sun_api" {
16│
17╵

As we suspected, that config wasn’t all we needed. (We haven’t even specified the image yet!) But it’s a good example of the difference between running terraform plan and terraform apply. terraform plan validates your config to make sure the syntax is valid, that any variables being referenced are defined, and that the required fields are populated. Even though a plan might be valid, however, Terraform doesn’t have much of an idea what AWS will say when it tries to execute the plan.

In this case, aws_ecs_service documentation specifies that TaskDefinition should be: “The family and revision (family:revision) or full ARN of the task definition that you want to run in your service.” It’s a good reminder that while Terraform helps us define our infrastructure, it doesn’t guarantee that the infrastructure we define will even run, much less meet best practices.

The good news is this: we know what to fix! Now that we’ve gone through one iteration of the code/plan/apply troubleshooting cycle, I’ll move a little faster. Let’s add these blocks to the ecs.tf file:

 1# We'll eventually want a place to put our logs.
 2resource "aws_cloudwatch_log_group" "sun_api" {
 3  name = "/ecs/sun-api"
 4}
 5
 6# Here's our task definition, which defines the task that will be running to provide
 7# our service. The idea here is that if the service decides it needs more capacity,
 8# this task definition provides a perfect blueprint for building an identical container.
 9#
10# If you're using your own image, use the path to your image instead of mine,
11# i.e. `<your_dockerhub_username>/sun-api:latest`.
12resource "aws_ecs_task_definition" "sun_api" {
13  family = "sun-api"
14
15  container_definitions = <<EOF
16  [
17    {
18      "name": "sun-api",
19      "image": "ghcr.io/jimmysawczuk/sun-api:latest",
20      "portMappings": [
21        {
22          "containerPort": 3000
23        }
24      ],
25      "logConfiguration": {
26        "logDriver": "awslogs",
27        "options": {
28          "awslogs-region": "us-east-1",
29          "awslogs-group": "/ecs/sun-api",
30          "awslogs-stream-prefix": "ecs"
31        }
32      }
33    }
34  ]
35  EOF
36
37  # These are the minimum values for Fargate containers.
38  cpu = 256
39  memory = 512
40  requires_compatibilities = ["FARGATE"]
41
42  # This is required for Fargate containers (more on this later).
43  network_mode = "awsvpc"
44}

Next, update the task_definition field in our aws_ecs_service block:

1resource "aws_ecs_service" "sun_api" {
2  name            = "sun-api"
3- task_definition = ""
4+ task_definition = aws_ecs_task_definition.sun_api.arn
5  launch_type     = "FARGATE"
6}

This is our first example of using a variable to populate another field, and it’s one of Terraform’s most powerful and appealing features. Instead of having to hardcode that ARN into our config, we can simply say: “that task definition I just created, whatever its ARN is, use it here.” If we ever destroy and recreate that task definition, and it gets a new ARN, this config will still work perfectly.

Running terraform apply should give us our first partial success:

 1$ terraform apply
 2...
 3Plan: 3 to add, 0 to change, 0 to destroy.
 4
 5Do you want to perform these actions?
 6  Terraform will perform the actions described above.
 7  Only 'yes' will be accepted to approve.
 8
 9  Enter a value: yes
10
11aws_cloudwatch_log_group.sun_api: Creating...
12aws_ecs_task_definition.sun_api: Creating...
13aws_cloudwatch_log_group.sun_api: Creation complete after 0s [id=/ecs/sun-api]
14╷
15│ Error: ClientException: Fargate requires task definition to have execution role ARN to support log driver awslogs.
16│
17│   with aws_ecs_task_definition.sun_api,
18│   on ecs.tf line 17, in resource "aws_ecs_task_definition" "sun_api":
19│   17: resource "aws_ecs_task_definition" "sun_api" {
20│
21╵

A couple things actually got created! Now that we’re in this for real, if you need to tear down everything, you can run terraform destroy. Like apply, it’ll give you a plan output that specifies what it intends to destroy, so make sure you inspect that closely. But Terraform will only touch resources it knows about, so it should only affect resources you’ve created here.

We’re still stuck on that task definition and it’s about to get weird, because it’s time to add some permissions. We need to create a role for the task to use while it’s running, but we have to also explicitly allow our ECS task to assume that role. AWS provides a policy we can use for execution, but we’ll have to attach it to a role we create. Add these lines to ecs.tf:

 1# This is the role under which ECS will execute our task. This role becomes more important
 2# as we add integrations with other AWS services later on.
 3
 4# The assume_role_policy field works with the following aws_iam_policy_document to allow
 5# ECS tasks to assume this role we're creating.
 6resource "aws_iam_role" "sun_api_task_execution_role" {
 7  name               = "sun-api-task-execution-role"
 8  assume_role_policy = data.aws_iam_policy_document.ecs_task_assume_role.json
 9}
10
11data "aws_iam_policy_document" "ecs_task_assume_role" {
12  statement {
13    actions = ["sts:AssumeRole"]
14
15    principals {
16      type = "Service"
17      identifiers = ["ecs-tasks.amazonaws.com"]
18    }
19  }
20}
21
22# Normally we'd prefer not to hardcode an ARN in our Terraform, but since this is
23# an AWS-managed policy, it's okay.
24data "aws_iam_policy" "ecs_task_execution_role" {
25  arn = "arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy"
26}
27
28# Attach the above policy to the execution role.
29resource "aws_iam_role_policy_attachment" "ecs_task_execution_role" {
30  role       = aws_iam_role.sun_api_task_execution_role.name
31  policy_arn = data.aws_iam_policy.ecs_task_execution_role.arn
32}

Here, we’re creating a role that AWS will use to run our app. First, we attach a policy that allows the role to be assumed by ECS tasks (blocks 1 and 2). Then we grab the AWS-defined default policy for ECS task execution and attach it (blocks 3 and 4).

Now we can add this line to our aws_ecs_task_definition resource:

 1# ecs.tf
 2resource "aws_ecs_task_definition" "sun_api" {
 3  ...
 4+ execution_role_arn = aws_iam_role.sun_api_task_execution_role.arn
 5
 6  cpu = 256
 7  memory = 512
 8  requires_compatibilities = ["FARGATE"]
 9  network_mode = "awsvpc"
10}

If we run terraform apply now, it seems to try for a long time to create the service before finally failing.

 1$ terraform apply
 2...
 3aws_iam_role.sun_api_task_execution_role: Creating...
 4aws_iam_role.sun_api_task_execution_role: Creation complete after 1s [id=sun-api-task-execution-role]
 5aws_iam_role_policy_attachment.ecs_task_execution_role: Creating...
 6aws_ecs_task_definition.sun_api: Creating...
 7aws_ecs_task_definition.sun_api: Creation complete after 0s [id=sun-api]
 8aws_ecs_service.sun_api: Creating...
 9aws_iam_role_policy_attachment.ecs_task_execution_role: Creation complete after 1s [id=sun-api-task-execution-role-20210902182501012700000002]
10aws_ecs_service.sun_api: Still creating... [10s elapsed]
11...
12aws_ecs_service.sun_api: Still creating... [3m50s elapsed]
13╷
14│ Error: error creating sun-api service: ClusterNotFoundException:
15│
16│   with aws_ecs_service.sun_api,
17│   on ecs.tf line 1, in resource "aws_ecs_service" "sun_api":
18│    1: resource "aws_ecs_service" "sun_api" {
19│
20╵

That output suggests that we need a cluster in which to put our service, so let’s create it:

 1# ecs.tf
 2+ resource "aws_ecs_cluster" "app" {
 3+   name = "app"
 4+ }
 5
 6  resource "aws_ecs_service" "sun_api" {
 7    name            = "sun-api"
 8    task_definition = aws_ecs_task_definition.sun_api.arn
 9+   cluster         = aws_ecs_cluster.app.id
10    launch_type     = "FARGATE"
11 }

After adding those lines, our next terraform plan run should tell us that we’re closing in. We just need to create the cluster and the service. But when we try to apply that, we get the following:

 1$ terraform apply
 2...
 3aws_ecs_cluster.app: Creating...
 4aws_ecs_cluster.app: Still creating... [10s elapsed]
 5aws_ecs_cluster.app: Creation complete after 11s [id=arn:aws:ecs:us-east-1:123456123456:cluster/app]
 6aws_ecs_service.sun_api: Creating...
 7╷
 8│ Error: error creating sun-api service: error waiting for ECS service (sun-api) creation: InvalidParameterException: Network Configuration must be provided when networkMode 'awsvpc' is specified.
 9│
10│   with aws_ecs_service.sun_api,
11│   on ecs.tf line 5, in resource "aws_ecs_service" "sun_api":
12│    5: resource "aws_ecs_service" "sun_api" {
13│
14╵

The good news is we’re still making progress. The bad news is we’re about to talk about networking.

Networking

We set our task definition’s network_mode to be awsvpc because that’s what AWS requires for Fargate tasks. Unfortunately, that comes with some other hidden dependencies. Namely, Fargate tasks need to be in a VPC.

Creating the VPC by itself is fairly simple, but it also requires you to define subnets, route tables, NAT gateways and more. So I’ll save you the pain I went through trying to get this stuff working properly, and just give you the config. Open a new file called network.tf and copy these lines into it.

  1# network.tf
  2resource "aws_vpc" "app_vpc" {
  3  cidr_block = "10.0.0.0/16"
  4}
  5
  6resource "aws_subnet" "public_d" {
  7  vpc_id            = aws_vpc.app_vpc.id
  8  cidr_block        = "10.0.1.0/25"
  9  availability_zone = "us-east-1d"
 10
 11  tags = {
 12    "Name" = "public | us-east-1d"
 13  }
 14}
 15
 16resource "aws_subnet" "private_d" {
 17  vpc_id            = aws_vpc.app_vpc.id
 18  cidr_block        = "10.0.2.0/25"
 19  availability_zone = "us-east-1d"
 20
 21  tags = {
 22    "Name" = "private | us-east-1d"
 23  }
 24}
 25
 26resource "aws_subnet" "public_e" {
 27  vpc_id            = aws_vpc.app_vpc.id
 28  cidr_block        = "10.0.1.128/25"
 29  availability_zone = "us-east-1e"
 30
 31  tags = {
 32    "Name" = "public | us-east-1e"
 33  }
 34}
 35
 36resource "aws_subnet" "private_e" {
 37  vpc_id            = aws_vpc.app_vpc.id
 38  cidr_block        = "10.0.2.128/25"
 39  availability_zone = "us-east-1e"
 40
 41  tags = {
 42    "Name" = "private | us-east-1e"
 43  }
 44}
 45
 46resource "aws_route_table" "public" {
 47  vpc_id = aws_vpc.app_vpc.id
 48  tags = {
 49    "Name" = "public"
 50  }
 51}
 52
 53resource "aws_route_table" "private" {
 54  vpc_id = aws_vpc.app_vpc.id
 55  tags = {
 56    "Name" = "private"
 57  }
 58}
 59
 60resource "aws_route_table_association" "public_d_subnet" {
 61  subnet_id      = aws_subnet.public_d.id
 62  route_table_id = aws_route_table.public.id
 63}
 64
 65resource "aws_route_table_association" "private_d_subnet" {
 66  subnet_id      = aws_subnet.private_d.id
 67  route_table_id = aws_route_table.private.id
 68}
 69
 70resource "aws_route_table_association" "public_e_subnet" {
 71  subnet_id      = aws_subnet.public_e.id
 72  route_table_id = aws_route_table.public.id
 73}
 74
 75resource "aws_route_table_association" "private_e_subnet" {
 76  subnet_id      = aws_subnet.private_e.id
 77  route_table_id = aws_route_table.private.id
 78}
 79
 80resource "aws_eip" "nat" {
 81  vpc = true
 82}
 83
 84resource "aws_internet_gateway" "igw" {
 85  vpc_id = aws_vpc.app_vpc.id
 86}
 87
 88resource "aws_nat_gateway" "ngw" {
 89  subnet_id     = aws_subnet.public_d.id
 90  allocation_id = aws_eip.nat.id
 91
 92  depends_on = [aws_internet_gateway.igw]
 93}
 94
 95resource "aws_route" "public_igw" {
 96  route_table_id         = aws_route_table.public.id
 97  destination_cidr_block = "0.0.0.0/0"
 98  gateway_id             = aws_internet_gateway.igw.id
 99}
100
101resource "aws_route" "private_ngw" {
102  route_table_id         = aws_route_table.private.id
103  destination_cidr_block = "0.0.0.0/0"
104  nat_gateway_id         = aws_nat_gateway.ngw.id
105}
106
107resource "aws_security_group" "http" {
108  name        = "http"
109  description = "HTTP traffic"
110  vpc_id      = aws_vpc.app_vpc.id
111
112  ingress {
113    from_port   = 80
114    to_port     = 80
115    protocol    = "TCP"
116    cidr_blocks = ["0.0.0.0/0"]
117  }
118}
119
120resource "aws_security_group" "https" {
121  name        = "https"
122  description = "HTTPS traffic"
123  vpc_id      = aws_vpc.app_vpc.id
124
125  ingress {
126    from_port   = 443
127    to_port     = 443
128    protocol    = "TCP"
129    cidr_blocks = ["0.0.0.0/0"]
130  }
131}
132
133resource "aws_security_group" "egress_all" {
134  name        = "egress-all"
135  description = "Allow all outbound traffic"
136  vpc_id      = aws_vpc.app_vpc.id
137
138  egress {
139    from_port   = 0
140    to_port     = 0
141    protocol    = "-1"
142    cidr_blocks = ["0.0.0.0/0"]
143  }
144}
145
146resource "aws_security_group" "ingress_api" {
147  name        = "ingress-api"
148  description = "Allow ingress to API"
149  vpc_id      = aws_vpc.app_vpc.id
150
151  ingress {
152    from_port   = 3000
153    to_port     = 3000
154    protocol    = "TCP"
155    cidr_blocks = ["0.0.0.0/0"]
156  }
157}

From a high level, here’s what’s going on. First, we create a VPC. The VPC lets us ensure our services are isolated from the rest of AWS and the world, which is definitely a good thing. But VPCs don’t come with any built-in configuration, so we have to do that ourselves. To our VPC, we add two sets of public and private subnets in two availability zones. This is a best practice that happens to be an AWS requirement: even if one of the availability zones go down, we should still be okay. Next, we define a route table for the public and private subnets and associate them accordingly: our public subnets will be exposed to the Internet via the Internet gateway directly, but we’ll put our private subnets behind a NAT gateway so that it can talk to the Internet but the Internet can’t get in. Finally, we’ll create some security groups so the Internet can reach our ALB, our ALB can reach our service and our service can reach the Internet.

Note that if we were using the console to do these operations, we’d get a couple security groups by default. Terraform removes these, however, so we have to recreate them explicitly. Also, check out those output blocks, which will tell us the VPC and subnet IDs on the command line when they’re created.

After adding that file, we can run terraform apply to create our VPC and various networking pieces. Everything should create successfully, but we’ll still see this error:

1╷
2│ Error: error creating sun-api service: error waiting for ECS service (sun-api) creation: InvalidParameterException: Network Configuration must be provided when networkMode 'awsvpc' is specified.
3│
4│   with aws_ecs_service.sun_api,
5│   on ecs.tf line 5, in resource "aws_ecs_service" "sun_api":
6│    5: resource "aws_ecs_service" "sun_api" {
7│
8╵

Back in ecs.tf, add this block to your aws_ecs_service block:

 1# ecs.tf
 2resource "aws_ecs_service" "sun_api" {
 3...
 4+ network_configuration {
 5+   assign_public_ip = false
 6
 7+   security_groups = [
 8+     aws_security_group.egress_all.id,
 9+     aws_security_group.ingress_api.id,
10+   ]
11
12+   subnets = [
13+     aws_subnet.private_d.id,
14+     aws_subnet.private_e.id,
15+   ]
16+ }
17}

With any luck, the service should now create successfully when we run terraform apply! We’re not done yet, but this calls for a celebration.

Load balancer

We’re getting really close now. Our service is created and our task is configured; all we need now is a way to let incoming traffic in. We need a load balancer (or ALB, for Application Load Balancer). Let’s add these lines to our ecs.tf file:

 1# ecs.tf
 2resource "aws_lb_target_group" "sun_api" {
 3  name        = "sun-api"
 4  port        = 3000
 5  protocol    = "HTTP"
 6  target_type = "ip"
 7  vpc_id      = aws_vpc.app_vpc.id
 8
 9  health_check {
10    enabled = true
11    path    = "/health"
12  }
13
14  depends_on = [aws_alb.sun_api]
15}
16
17resource "aws_alb" "sun_api" {
18  name               = "sun-api-lb"
19  internal           = false
20  load_balancer_type = "application"
21
22  subnets = [
23    aws_subnet.public_d.id,
24    aws_subnet.public_e.id,
25  ]
26
27  security_groups = [
28    aws_security_group.http.id,
29    aws_security_group.https.id,
30    aws_security_group.egress_all.id,
31  ]
32
33  depends_on = [aws_internet_gateway.igw]
34}
35
36resource "aws_alb_listener" "sun_api_http" {
37  load_balancer_arn = aws_alb.sun_api.arn
38  port              = "80"
39  protocol          = "HTTP"
40
41  default_action {
42    type             = "forward"
43    target_group_arn = aws_lb_target_group.sun_api.arn
44  }
45}
46
47output "alb_url" {
48  value = "http://${aws_alb.sun_api.dns_name}"
49}

This last output block is important because it will tell us what URL we’ll use to reach the service without us having to go into the AWS console to figure it out.

Next, add this block to your aws_ecs_service block:

1resource "aws_ecs_service" "sun_api" {
2...
3+ load_balancer {
4+   target_group_arn = aws_lb_target_group.sun_api.arn
5+   container_name   = "sun-api"
6+   container_port   = "3000"
7+ }
8}

One more change. By default, the ECS service we created won’t start any containers. We need to tell it how many containers we want.

1resource "aws_ecs_service" "sun_api" {
2...
3+
4+ desired_count = 1
5}

Finally, run terraform apply one more time. (The ALB may take a bit to spin up.)

 1$ terraform apply
 2...
 3Plan: 4 to add, 0 to change, 1 to destroy.
 4
 5Changes to Outputs:
 6  + alb_url = (known after apply)
 7
 8Do you want to perform these actions?
 9  Terraform will perform the actions described above.
10  Only 'yes' will be accepted to approve.
11
12  Enter a value: yes
13
14...
15
16Apply complete! Resources: 4 added, 0 changed, 1 destroyed.
17
18Outputs:
19
20alb_url = "http://sun-api-lb-1234512345.us-east-1.elb.amazonaws.com"

Finally, finally, finally, copy and paste that URL into your browser. If all has gone well, you should see the service respond!

If you’re tired of reading, feel free to skip to the end; the hard part is over. But if you’re in the mood to tackle just a couple more changes, we can really put a bow on this API.

Cleanup

You may have noticed that the load balancer is listening on HTTP, not HTTPS. In most cases, we’ll want APIs to be served over HTTPS, so let’s try and correct that using a certificate issued by AWS. You’ll need a domain (or a subdomain) with DNS that you control.

Add these lines to your ecs.tf file, substituting in your domain name in the first block (fully qualified, but without https):

 1# ecs.tf
 2resource "aws_acm_certificate" "sun_api" {
 3  domain_name       = "sun-api.jimmysawczuk.net"
 4  validation_method = "DNS"
 5}
 6
 7output "domain_validations" {
 8  value = aws_acm_certificate.sun_api.domain_validation_options
 9}
10
11# These comments are here so Terraform doesn't try to create the listener
12# before we have a valid certificate.
13# resource "aws_alb_listener" "sun_api_https" {
14#   load_balancer_arn = aws_alb.sun_api.arn
15#   port              = "443"
16#   protocol          = "HTTPS"
17#   certificate_arn   = aws_acm_certificate.sun_api.arn
18#
19#   default_action {
20#     type             = "forward"
21#     target_group_arn = aws_lb_target_group.sun_api.arn
22#   }
23# }

Next, find your sun_api_http listener and change the default action to this:

 1# ecs.tf
 2resource "aws_alb_listener" "sun_api_http" {
 3...
 4-  default_action {
 5-    type             = "forward"
 6-    target_group_arn = aws_lb_target_group.sun-api.arn
 7-  }
 8
 9+  default_action {
10+    type = "redirect"
11
12+    redirect {
13+      port        = "443"
14+      protocol    = "HTTPS"
15+      status_code = "HTTP_301"
16+    }
17+  }
18}

Running terraform apply here should update your existing HTTP listener in place, then create a new HTTP listener which redirects to HTTPS. It’ll also create your certificate. But before you can turn on the HTTPS listener, you’ll need to validate the domain you chose with your DNS provider. The output of the apply should give you all the information you need:

1domain_validations = toset([
2  {
3    "domain_name" = "sun-api.jimmysawczuk.net"
4    "resource_record_name" = "_b60a3030189fef2d4239f2c64587866c.sun-api.jimmysawczuk.net."
5    "resource_record_type" = "CNAME"
6    "resource_record_value" = "_ee46084a09797925cf49c173dd9fadef.duyqrilejt.acm-validations.aws."
7  },
8])

That block tells me I should create a CNAME record that looks like this:

1_b60a3030189fef2d4239f2c64587866c.sun-api 60 IN CNAME _ee46084a09797925cf49c173dd9fadef.duyqrilejt.acm-validations.aws.

While you’re there, go ahead and create a second CNAME record that points your domain at your load balancer URL. For me, that’d be:

1sun-api 60 IN CNAME sun-api-lb-1234512345.us-east-1.elb.amazonaws.com.

Your DNS provider should have instructions on how to create CNAME records, like this page from Cloudflare.

Once the validation CNAME record is created, you can uncomment the HTTPS listener block and run terraform apply once more. If this seems to try for a while before timing out, the DNS for the validation record may not have propagated yet, which means AWS hasn’t been able to validate your domain. Give it a few minutes and then try again. (You can also monitor the status of your certificate in ACM in the console.)

Whenever the listener gets created successfully, you should be able to hit the API using https://<your-domain> rather than the load balancer URL.

One last thing. Say we’re ready to start writing our own proprietary code and we want to switch our service to pull from a private ECR repository. This is actually pretty straightforward. Go ahead and add these lines to ecs.tf:

1resource "aws_ecr_repository" "sun_api" {
2  name = "sun-api"
3}

Next, change the image field in your task definition JSON to reference the ECR repo:

 1resource "aws_ecs_task_definition" "sun_api" {
 2  family = "sun-api"
 3
 4  container_definitions = <<EOF
 5  [
 6    {
 7      "name": "sun-api",
 8-     "image": "ghcr.io/jimmysawczuk/sun-api:latest",
 9+     "image": "${aws_ecr_repository.sun_api.repository_url}:latest",
10      ...
11    }
12  ]
13  ...
14}

You can run terraform apply to make these changes, but remember that your service won’t work properly until you actually push an image to your new ECR repo. Follow these instructions provided by AWS to authenticate your Docker CLI with your new ECR repo.

Conclusion

That wasn’t so bad, was it?

…

Okay, maybe it was a little rough. But we accomplished quite a bit. Not only did we spin up a Fargate service on HTTPS from scratch, but we did it using Terraform. That means rather than wasting time haphazardly clicking random buttons in the AWS console, we have an exact blueprint for how we spun this service up. And even better, we can instantly clone this Fargate service and create a second one that functions in the same — or similar — way. We might even decide that this is the way we want to create all Fargate services in the future and turn this into a module. That way, all we’ll have to do to spin up a new service is invoke the module with the parameters we define, abstracting away all of the boilerplate AWS stuff we now know we need. But that can wait until next time.

Thanks very much for reading! As promised, here’s a link to the repository with everything we’ve done today. If you are or are aspiring to be a technical person, I hope this was useful. Please let me know how I can improve this post in the comments or by emailing me at feedback@section411.com.

And if you’re not technical and you made it this far, I really appreciate you reading. I’ll be back with some baseball, a movie review or a personal story next time.

Thanks to Sara Sawczuk for reading a draft of this post. When she hits it big as an editor for real writers, I hope she gives me a family discount. Thanks also to Jordan Castillo Chavez for reviewing the more technical parts of this post.

This post and its accompanying repository was updated in September 2021 to use Terraform 1.0.5 and clean up some weird resource names, and then again in December 2021 to improve the networking setup.