How Section 411 went serverless (mostly)
A little over two years ago, I launched Section 411. In a post that made the launch official, published about a month after the site went live, I formally introduced the site, writing about the new name and some of the technology that powered it. I wrote that unlike its predecessor, Section 411 was built using a static site generator (Hugo) instead of relying on something to render the pages in real time. I also wrote about Louvre, an image manager and processor I built that could dynamically serve images to a CDN, solving what’s typically a pain point for static site generators.
I’m pleased to say that over the last two years, while I’ve certainly made my fair share of design tweaks and updates, I’m still really happy with the overall architecture of Section 411. I’ve even been able to transition Section 411 from an Apache server to a Netlify instance, removing my need to run a traditional HTTP server to serve the site.
But ever since I first launched Section 411, I wasn’t very happy with how Louvre turned out. I knew I’d need a decent image manager to make Section 411 possible, but I also knew I didn’t want to spend a ton of time on it. So after a few starts and stops over that summer before launching, I finally decided to write the first version of Louvre as a Laravel (PHP) application. Engineering is all about trade-offs, I told myself, and writing Louvre in Laravel would help me get it done quicker and thus allow me to focus on the rest of Section 411.
As an image manager, Louvre was and still is just fine. The interface isn’t amazing (and it’s still only half done), but it lets me upload, transform and crop images. My main displeasure was with Louvre’s image processor. As a PHP application, it requires a full HTTP server to be running all the time, waiting for traffic, but for the most part, it sits idle. When traffic does finally come in, it tends to come in surges. Section 411’s homepage is covered with images, and all it takes is one user to hit the site with a cold CDN cache for Louvre to be inundated with simultaneous and time-sensitive requests. A powerful server, especially sitting behind a CDN, could handle this with no problem. But I didn’t want to pay for a powerful server to sit idle most of the time. So instead, Louvre lived on a tiny server that was never properly utilized: it either sat idle, or was overwhelmed.
One of the hottest trends in backend engineering today are serverless functions. Amazon Web Services, one of the first companies to enter the serverless market, brands their product as Lambda. Here’s how Amazon describes it:
With Lambda, you can run code for virtually any type of application or backend service - all with zero administration. Just upload your code and Lambda takes care of everything required to run and scale your code with high availability. You can set up your code to automatically trigger from other AWS services or call it directly from any web or mobile app.
The first two sentences make it easy to see why this concept is intriguing, even if the term “serverless” is probably going a little too far. (If Jerry Seinfeld were to ever go on stage with a bit on “serverless,” the punchline would undoubtedly be “there’s gotta' be a server somewhere!") Writing applications is hard enough; deploying them can be even harder because it often requires a completely different skillset. The fewer obstacles there are between the code being on my laptop and the code being deployed, the better.
But it’s the last sentence that got me thinking about using serverless functions for Louvre. A traditional application can literally do anything, but because of that flexibility, its behavior can be complicated. Modeling what an application might be doing at any given moment would probably involve a fairly complicated flowchart. Lambdas, on the other hand, are essentially just functions: inputs and outputs. Their behavior is well-defined and linear. This simplicity gives Lambdas incredible versatility in that they can be plugged in to a variety of inputs and chained together to offer the same functionality as a traditional application but with all of the state management complexity abstracted away. It’s a nod to old-school Unix-style programs that only have one or two primary features but can be so powerful when chained together on the command line.
My idea was to leave the image manager part of Louvre alone and let that live on as a Laravel app for as long as necessary. What I’d focus on was the part that wasn’t working well: the image processor. As a serverless function, I could focus on the inputs (an HTTP request) and the outputs (images). All I’d have to do was write a function that parses a request URL to look up an image, run any transforms needed, and then serve the image. I’d also leave it behind a CDN to keep things speedy.
After a little research, I ended up choosing Google Cloud Functions over AWS Lambda for two reasons. First, Cloud Functions can respond to HTTP requests out of the box, where Lambdas require you to set up an API Gateway or Cloudfront to actually capture the request and forward it to the Lambda. This wasn’t a dealbreaker, but it meant there’d be extra work to get the Lambda deployed properly. Secondly, while both Lambda and Cloud Functions support Go, Cloud Functions does it more natively. With Cloud Functions, you only need to open a file, write an
http.HandlerFunc, copy it into your Cloud Function config, and finally specify it as your “Function to execute”.
Here’s how I wrote the function to serve Louvre images, with some annotations.
Here are a few takeaways, in case that code looks like gibberish or you just skimmed it. First, yes, I know I’m writing to an Amazon S3 bucket on a Google Cloud Function. I’m a monster. Second, the
http.Handler interface continues to be one of the greatest things in the Go standard library. Here, it lets us effortlessly shim a production-grade router into the entrypoint for the Cloud Function. And finally, the handler that actually serves the image is kind of long, but I still think it’s pretty simple because it’s completely linear: the input is always an
*http.Request that’s been pre-routed, the output is some sort of HTTP response.
My biggest issue with Google Cloud Functions was related to its handling of Go, and how it lets you write your handler as a normal
http.HandlerFunc. Go is a statically-compiled language, which means that any code that’s part of your program has to be there as part of the compile. There’s no way to take Go code and bolt it into a precompiled or running program, unless you do something like compile the new code into a separate program and execute that program from your first program. (This is what Lambda does.)
So when you upload your code with your
HandlerFunc, Cloud Functions combines it with some common code that can invoke it, and compiles the whole thing to use as your function. This is similar to how
go test works, and for the most part, it’s not a problem. It only becomes a problem if you ever need to compare line numbers in stack traces (like if your handler panics) to your code. The additional code that Google has added means the line numbers won’t match, so you’ll have to use your logging to figure out what’s breaking.
This fact that Google combines your code with code of its own means you might not be able to lay out your repository the way you normally would. I eventually landed on this structure, which probably looks strange to you if you’ve written Go for a while:
Because each function is self-contained, it needs to resolve its own dependencies, so we have to put the
go.sum files in the directory with the function. Putting these files in a non-root directory is definitely not normal, but fortunately Go modules seem to be more flexible than
$GOPATH-based dependency managers. What’s stranger is putting a library or shared package (
image) underneath the main
serve package, but I couldn’t find a way to make it work with
image being in a more common directory, apart from splitting off the
image package into its own repository and importing it via Go modules.
Overall, I was really happy with how this project turned out. The entire process of moving Louvre to a serverless platform took maybe six or seven hours total, starting with absolutely nothing and ending with the function deployed to production. I didn’t have to do any additional work to get the function to run concurrently; that was a benefit I just got for free. The biggest win for me was that I got to spend most of my time focusing on my image processing code, rather than working on the infrastructure to get it deployed. I spent my time solving the problem I was trying to solve, not fighting with infrastructure.
By now, you might be wondering: what does all this cost? The database that powers Louvre is unchanged, and it’s still the vast majority of the total monthly bill. The CDN (Cloudfront) is also unchanged, and at Section 411’s current traffic levels, the CDN bill is pretty cheap. This solution makes a little more use of S3, but it’s not much more, and the total amount of data Louvre has in S3 is less than 10 GB, meaning my S3 storage bill is no more than $0.30 a month.
The only real change to the infrastructure is the usage of a Google Cloud Function. I took an hour or so one night to try to understand the pricing page, and after a lot of math I estimated my costs for Cloud Functions would be somewhere between $5 and $10 a month.
So here’s the short answer on how much all this costs: exactly $0 more than before. I forgot to account for the fact that you’re not charged for Google Cloud Functions until you exceed the free tier. Turns out the Internet is pretty cheap when it’s serverless.
Thanks to Sara Sawczuk for reading a draft of this post and catching a critical bug on line 134 of my code.