Hello, world
The Fargate/Terraform tutorial I wish I had
In my last way-too-long, way-too-technical, seriously-nobody-cares technical post, I wrote about serverless functions. The main benefit of serverless functions, I wrote, is that you can deploy code to production without having to worry about keeping a server online, secure, and up-to-date. But a secondary benefit of serverless function is also its main trade-off: they’re just functions. Computer scientists might call them pure functions, because the outputs of serverless functions are usually entirely dependent on their inputs and nothing else. You could also call them stateless, because they don’t retain any artifacts or side effects from any one invocation. (The runtimes from AWS and Google fudge this somewhat, but let’s pretend.) This trade-off makes the code simpler to understand and to debug.
For many cases, as it was for Louvre, the trade of simplicity for state is well worth it. But other times, it’s worth it to have a more stateful system. An API might want to store database connections for reusability, or maintain in-memory caches for speed, or simply maintain a counter for the purpose of rate-limiting. And that’s where AWS Fargate comes in.
Fargate is sort of the best of both worlds. Like its predecessor, it’s a way of launching containers on AWS while maintaining visibility on the container after it launches. But unlike its predecessor, the EC2 launch type, Fargate doesn’t require you to pre-allocate and maintain an instance on which to run your container. With Fargate, you simply get to define your container and launch it.
Or at least that’s the promise. AWS, however, is complicated, and launching a Fargate service using the console is no mean feat. You have to use at least five different AWS services, in a specific order, and that’s not including any databases or other integrations you might want to use. The console never really tells you where to start. It never tells you where to go next. Sometimes the information the console gives to you is just plain wrong. If you mess up, you might be able to fix it. But if not, you might have to start from scratch.
That’s why infrastructure configuration languages like Terraform are so appealing. You simply define your infrastructure once, in code. Then you run a program which uses that configuration to build your infrastructure. If you mess up, or want to try something new, you can simply blow it all up and rest assured that it’s just as easy to recreate it. Best of all, all infrastructure changes can now be peer-reviewed and committed to version control, a requirement in highly-regulated environments and a plus everywhere else.
But the promise of Terraform is a little too good to be true, and that’s because Terraform has to play by the rules of your cloud provider. Terraform will build whatever infrastructure you tell it to, but you still have to know what you want. With AWS, and newer services like Fargate in particular, this isn’t always clear. So while you’ll see a lot of Terraform in this tutorial, this is really a tutorial on how to set up a Fargate service.
Here’s the goal: we’re going to try to spin up a Fargate service, using Terraform and as minimal a configuration as we can get away with. I’ll show the code step by step below, but at the end of this article I’ll provide a link to a Github repository with all of the Terraform necessary to start a Fargate service, with a few minimal changes.
(If you’ve read this far and find yourself wanting to run from the room screaming, thanks for sticking with me for this long! I’ll write about something more interesting next time, I promise.)
The setup
Let’s start with the app. The core building blocks of Fargate services are Docker containers and the whole point of Docker, or containerization in general, is that the host operating system no longer has to care about what sort of app is in the container (and vice versa!). So as a demo, I wrote a quick Go app (natch), but it could easily be a Node app, a Rails app or even just a webserver serving static files (don’t actually do this last one – there are better ways to solve that particular problem).
Here’s our application:
|
|
Hopefully, it’s pretty straightforward. We’re creating an HTTP server and exposing two endpoints on it. One of them is just a health check; the other queries an API for the sunrise and sunset times for a particular location on a particular day. The whole app is less than 100 lines of code, but it’s doing two things that we’d want a typical API to do: listen for requests and act as a gateway to make requests to an upstream service.
Next, in order to deploy it on Fargate, we need to define the Docker container – or Dockerize – our app. Here’s the Dockerfile which makes that happen:
|
|
We’re using two stages in this Dockerfile. The first stage, which starts on line 1, builds the application. The second stage, starting on line 7, copies the built application into a slimmer and less permissive environment.
To make things a little easier, I took the liberty of building the Docker image and pushing it to Docker Hub. This saves us the couple of steps required to create an ECR repository on AWS and push the image there; the service we’re about to create can just use the image from Docker Hub.
I promise this isn’t some half-witted attempt to get you to install malicious code in a Docker container in your AWS account, but if you’re wary of grabbing a random Docker image to put in your AWS environment, you can create the image yourself and upload it to a Docker Hub repository you control:
- Create a repository called
sun-api
on Docker Hub (for now, to keep things simple, make sure it’s public). - Make sure you’re logged into your Docker account on the CLI by running
docker login
. - Grab the two files above and put them in a directory together (not your
$GOPATH
). Rungo mod init
; this should create ago.mod
file and ago.sum
file. - Run
docker build -t <your_docker_username>/sun-api:latest .
. - Run
docker push <your_docker_username>/sun-api:latest
.
Keep an eye out for when we use this image name later on in the tutorial and replace my username in the image path with yours.
(By the way, creating a private ECR repo to push Docker images to and making your service pull from that repo isn’t hard. The part that’s a bit of a pain is actually pushing your image from your machine to the ECR repo. So I’m opting to skip it. But I’ll give you the Terraform for creating the ECR repo as well, if you want to do that in the future.)
Now that our app is ready to deploy, let’s start writing some Terraform. Add these two files to your directory:
|
|
|
|
Both of these files contain a line that says profile = "tfuser"
. This tells Terraform how to authenticate with your AWS account. You’ll need to set this up manually: under IAM in the AWS Console, select Users in the left hand nav, then find the Add User button (or just click here). The username should be tfuser
, and make sure the checkbox labeled “programmatic access” is checked. On the next screen, make sure to add the AdministratorAccess
policy.
After creating your user, you should see a screen with an Access Key ID and Secret Access Key. Copy those values into a file named ~/.aws/credentials
, with the following format:
|
|
Terraform reads every file ending in .tf
in the same directory as part of the same workspace, so we can split up our code into meaningful files. provider.tf
contains a provider
block, which tells Terraform that we’re going to creating resources on AWS and how it should authenticate with AWS. backend.tf
tells Terraform where to put the state file. The state file is important because it’s how Terraform keeps track of the infrastructure it has created or imported and it’s what Terraform uses to determine what changes are necessary when given a new configuration.
Our backend
block under terraform
is telling AWS we’re going to put the state file in an S3 bucket called terraform
with a filenamed called terraform.tfstate
. You’ll probably need to change the bucket name to something more unique; since S3 bucket names are unique per region, there’s a really good chance someone is using the name terraform
. Create a bucket with the name you picked in the S3 console (the default, totally private settings are what you want), then set that as your bucket name in backend.tf
.
Next, from the command line and in the same directory as your two .tf
files, run terraform init
. You should see a success message that looks something like this:
|
|
If that’s what you see, great! If not, make sure your tfuser
user has the appropriate AWS permissions and verify that Terraform is installed correctly on your machine (if you run terraform version
, you should see something along the lines of Terraform v0.12.x
).
The service
Our end goal is to create a Fargate ECS service. So let’s start by creating that and see where we get. From the Terraform documentation, it seems like we want to create an aws_ecs_service
. Let’s add an aws_ecs_service
resource
block, with the required fields filled out as well as we can. (Paste this into a new file called ecs.tf
.)
|
|
It turns out there are only two absolutely required fields, so our first iteration is pretty simple. We’re supplying the name
of the service, which is arbitrary. We don’t know what the task_definition
is yet, so we’ll just use an empty string for now. This is obviously not going to be our final solution, but let’s run a plan and see where we are.
|
|
This is just a plan, meaning we haven’t actually made any changes to our AWS environment yet. But it’s always a good idea to inspect the plan output to make sure Terraform is doing what we expect it to do. In this case, the only thing that seems off is the launch_type
. Terraform is saying it will be “known after apply,” which means it’ll use whatever AWS defaults to. We want to ensure it’s FARGATE
, so let’s add that line:
|
|
And here’s the resulting output:
|
|
This seems too easy, but let’s run an apply anyway, just to see what happens:
|
|
As we suspected, that config wasn’t all we needed. (We haven’t even specified the image yet!) But it’s a good example of the difference between running terraform plan
and terraform apply
. terraform plan
validates your config to make sure the syntax is valid, that any variables being referenced are defined, and that the required fields are populated. Even though a plan might be valid, however, Terraform doesn’t have much of an idea what AWS will say when it tries to execute the plan.
In this case, aws_ecs_service
documentation specifies that TaskDefinition
should be: “The family and revision (family:revision
) or full ARN of the task definition that you want to run in your service.” It’s a good reminder that while Terraform helps us define our infrastructure, it doesn’t guarantee that the infrastructure we define will even run, much less meet best practices.
The good news is this: we know what to fix! Now that we’ve gone through one iteration of the code/plan/apply troubleshooting cycle, I’ll move a little faster. Let’s add these blocks to the ecs.tf
file:
|
|
Next, update the task_definition
field in our aws_ecs_service
block:
|
|
This is our first example of using a variable to populate another field, and it’s one of Terraform’s most powerful and appealing features. Instead of having to hardcode that ARN into our config, we can simply say: “that task definition I just created, whatever its ARN is, use it here.” If we ever destroy and recreate that task definition, and it gets a new ARN, this config will still work perfectly.
Running terraform apply
should give us our first partial success:
|
|
A couple things actually got created! Now that we’re in this for real, if you need to tear down everything, you can run terraform destroy
. Like apply, it’ll give you a plan output that specifies what it intends to destroy, so make sure you inspect that closely. But Terraform will only touch resources it knows about, so it should only affect resources you’ve created here.
We’re still stuck on that task definition and it’s about to get weird, because it’s time to add some permissions. We need to create a role for the task to use while it’s running, but we have to also explicitly allow our ECS task to assume that role. AWS provides a policy we can use for execution, but we’ll have to attach it to a role we create. Add these lines to ecs.tf
:
|
|
Here, we’re creating a role that AWS will use to run our app. First, we attach a policy that allows the role to be assumed by ECS tasks (blocks 1 and 2). Then we grab the AWS-defined default policy for ECS task execution and attach it (blocks 3 and 4).
Now we can add this line to our aws_ecs_task_definition
resource:
|
|
If we run terraform apply
now, it seems to try for a long time to create the service before finally failing.
|
|
That output suggests that we need a cluster in which to put our service, so let’s create it:
|
|
After adding those lines, our next terraform plan
run should tell us that we’re closing in. We just need to create the cluster and the service. But when we try to apply that, we get the following:
|
|
The good news is we’re still making progress. The bad news is we’re about to talk about networking.
Networking
We set our task definition’s network_mode
to be awsvpc
because that’s what AWS requires for Fargate tasks. Unfortunately, that comes with some other hidden dependencies. Namely, Fargate tasks need to be in a VPC.
Creating the VPC by itself is fairly simple, but it also requires you to define subnets, route tables, NAT gateways and more. So I’ll save you the pain I went through trying to get this stuff working properly, and just give you the config. Open a new file called network.tf
and copy these lines into it:
|
|
From a high level, here’s what’s going on. First, we create a VPC with two subnets – a public and a private. Then we define route tables for each subnet and associate them accordingly. Our public subnet will be exposed to the Internet via the Internet gateway directly, but we’ll put our private subnet behind a NAT gateway so that it can talk to the Internet but the Internet can’t get in. Finally, we’ll create some security groups that let some traffic flow freely.
Note that if we were using the console to do these operations, we’d get a couple security groups by default. Terraform removes these, however, so we have to recreate them explicitly. Also, check out those output
blocks, which will tell us the VPC and subnet IDs on the command line when they’re created.
After adding that file, we can run terraform apply
to create our VPC and various networking pieces. Everything should create successfully, but we’ll still see this error:
|
|
Back in ecs.tf
, add this block to your aws_ecs_service
block:
|
|
With any luck, the security group and the service should now create successfully when we run terraform apply
! We’re not done yet, but this calls for a celebration.
Load balancer
We’re getting really close now. Our service is created and our task is configured; all we need now is a way to let incoming traffic in. We need a load balancer (or ALB, for Application Load Balancer). Let’s add these lines to our ecs.tf
file:
|
|
This last output
block is important because it will tell us what URL we’ll use to reach the service without us having to go into the AWS console to figure it out.
Next, add this block to your aws_ecs_service
block:
|
|
One more change. By default, the ECS service we created won’t start any containers. We need to tell it how many containers we want.
|
|
Finally, run terraform apply
one more time. (The ALB may take a bit to spin up.)
|
|
Finally, finally, finally, copy and paste that URL into your browser. If all has gone well, you should see the service respond!
If you’re tired of reading, feel free to skip to the end; the hard part is over. But if you’re in the mood to tackle just a couple more changes, we can really put a bow on this API.
Cleanup
You may have noticed that the load balancer is listening on HTTP, not HTTPS. In most cases, we’ll want APIs to be served over HTTPS, so let’s try and correct that using a certificate issued by AWS. You’ll need a domain (or a subdomain) with DNS that you control.
Add these lines to your ecs.tf
file, substituting in your domain name in the first block (fully qualified, but without https
):
|
|
Next, find your sun-api-http
listener and change the default action to this:
|
|
Running terraform apply
here should destroy your existing HTTP listener, then create a new HTTP listener which redirects to HTTPS. It’ll also create your certificate. But before you can turn on the HTTPS listener, you’ll need to validate the domain you chose with your DNS provider. The output of the apply should give you all the information you need:
|
|
That block tells me I should create a CNAME
record that looks like this:
_b60a3030189fef2d4239f2c64587866c.sun-api 60 IN CNAME _ee46084a09797925cf49c173dd9fadef.duyqrilejt.acm-validations.aws.
While you’re there, go ahead and create a second CNAME record that points your domain at your load balancer URL. For me, that’d be:
sun-api 60 IN CNAME sun-api-lb-1234567890.us-east-1.elb.amazonaws.com.
Your DNS provider should have instructions on how to create CNAME records, like this page from Cloudflare.
Once the validation CNAME record is created, you can uncomment the HTTPS listener block and run terraform apply
once more. If this seems to try for a while before timing out, the DNS for the validation record may not have propagated yet, which means AWS hasn’t been able to validate your domain. Give it a few minutes and then try again. (You can also monitor the status of your certificate in ACM in the console.)
Whenever the listener gets created successfully, you should be able to hit the API using https://<your-domain>
rather than the load balancer URL.
One last thing. Say we’re ready to start writing our own proprietary code and we’d rather not push that image to Docker Hub, even if it’s private. Fortunately, it’s pretty easy to create an ECR repo that we can push images to. Go ahead and add these lines to ecs.tf
:
|
|
Next, change the image
field in your task definition JSON to reference the ECR repo:
|
|
You can run terraform apply
to make these changes, but remember that you won’t be able to update the service to use images from your ECR repo until you actually push an image there. Follow these instructions provided by AWS to authenticate your Docker CLI with your new ECR repo.
Conclusion
That wasn’t so bad, was it?
…
Okay, maybe it was a little rough. But we accomplished quite a bit. Not only did we spin up a Fargate service on HTTPS from scratch, but we did it using Terraform. That means rather than wasting time haphazardly clicking random buttons in the AWS console, we have an exact blueprint for how we spun this service up. And even better, we can instantly clone this Fargate service and create a second one that functions in the same – or similar – way. We might even decide that this is the way we want to create all Fargate services in the future and turn this into a module. That way, all we’ll have to do to spin up a new service is invoke the module with the parameters we define, abstracting away all of the boilerplate AWS stuff we now know we need. But that can wait until next time.
Thanks very much for reading! As promised, here’s a link to the repository with everything we’ve done today. If you are or are aspiring to be a technical person, I hope this was useful. Please let me know how I can improve this post in the comments or by emailing me at feedback@section411.com.
And if you’re not technical and you made it this far, I really appreciate you reading. I’ll be back with some baseball, a movie review or a personal story next time.
Thanks to Sara Sawczuk for reading a draft of this post. When she hits it big as an editor for real writers, I hope she gives me a family discount. Thanks also to Jordan Castillo Chavez for reviewing the more technical parts of this post.
This post and its accompanying repository was updated in July 2020 to use Terraform 0.12 syntax.