[This fragment is available in an audio version.]
At the end of September, AWS
announced a big new feature for its
Step Functions product, and
my tweet noting the announcement got a shocking number of
impressions for something way out at the geeky end of Cloud tech.
In retrospect, a design choice we made back in 2016
turns out to be working very well, and there’s a lesson to be learned here: If you need to integrate an arbitrarily
large and and diverse set of software capabilities, URIs are the best integration glue.
Step Functions launched in December 2016, and I did a whole lot of work on it.
In particular, my fingerprints are all over
Amazon States Language, a JSON DSL that describes all the workflow stuff:
What software to run, branches and loops, error handling, retrying, parallelism, and so on.
In the States Language, each of the
steps in the workflow is represented by a little blob of JSON called a “state”, and each has a Type field saying
what it does. (I wanted to call the product “Amazon States” or
“AWS State Machines” but Andy and Charlie puked at that and we ended up with Step Functions, which isn’t terrible.)
The argument ·
When we first cooked up the product, the only real target we had was Lambda functions, and so the suggestion was that we have
a "Type": "Lambda" state, with another field that would give the name of the Lambda function.
But I said
“Long-term, we want to be able to orchestrate lots of other things, not just Lambdas, right?” Everyone agreed. So I said “OK
then, let’s just have a Task state which identifies the worker with a URI. That way, everything we orchestrate has the same contract,
you send it some JSON and you get some JSON back.”
People looked a bit puzzled and said “But Lambdas don’t have URIs.” I said
“Sure they do, they have
ARNs and ARNs are URIs.” (Well, they
would be if Amazon registered the “arn:” URI scheme, which I should have while I was there and they should now. But close
There was a little push-back on making people use the long klunky-looking ARN as opposed to the nice user-friendly function
name, but I was pretty convinced and eventually won the argument.
I was remembering the dawn of the Web, quoting from someone (I think TimBL?) who said “On the Web, a resource is a unit of information or service.” Which I thought was a good fit here.
Flashing forward five years ·
Let’s just have a look at what Step Functions has been integrated with. Start
here, and scroll down (there are duplicate
anchors, grr) to the “Service Integrations” header, and look at the table.
As I write this, there are 17 “Optimized”
integrations, and then 200+ SDK-based integrations.
And they all use the same Task state and address the target
worker by URI (which at the moment is always an ARN).
The ARN for an “Optimized integration” looks like (taking EMR for an example):
“Optimized” means it’s smart about the way it calls the
service and can operate in either fire-and-forget or wait-for-completion mode. Also, it can autogenerate IAM policies to make
your life easier.
The recent announcement that kicked this discussion off made it possible to call more or less any API in the AWS SDK,
addressing it with an ARN like:
excellent blog that walks you through the process.
I’m happy ·
I’m feeling just the tiniest bit smug that they were able to add all these integrations, and in particular this latest huge
one, without needing to make any major changes to the States Language.
But to be honest, all of that comes more or less for free
once you decide that everything you might want to integrate is a resource and thus should be identified by a Uniform Resource
I recommend this design pattern.
The future ·
I’ve always thought that once you agree to address things by URI, well that includes HTTP URLs, so why shouldn’t a Step
Functions Task state be able to include an arbitrary external Web endpoint?
SNS can already do this.
Now… it’s kind of scary
making an AWS service take a runtime dependency on an uncontrolled external anything, so this would be tricky to implement.
But it’s another thing you could do with no language changes, just because you decided to do things the Web-native way.
Comment feed for ongoing:From: Gabe (Oct 27 2021, at 15:25)Thank you Tim! We owe ya. Wish you were here :-)
[link]From: Nik P (Oct 27 2021, at 16:38)> Now… it’s kind of scary making an AWS service take a runtime dependency on an uncontrolled external anything, so this would be tricky to implement.It is!But we did it :) https://aws.amazon.com/blogs/compute/using-api-destinations-with-amazon-eventbridge/Just as you can build reliable network protocols on top of unreliable ones, you can make uncontrolled external dependencies reliable with ApiDestinations and THEIR URI ;) arn:aws:events:us-west-2:123456789012:api-destination/fooBarBaz
[link]From: Justin (Oct 28 2021, at 18:51)Can also accomplish this using the Optimized Service Integration with API-Gateway, but you'd need to configure API-G to map to the external target.https://docs.aws.amazon.com/step-functions/latest/dg/connect-api-gateway.htmlAgree on the design pattern here. It's great to see when these "1-way-doors" turn out right! I miss our robust discussions on such topic. I think that's the funnest part of the job.
[link]From: Tim (but not THE Tim) (Oct 29 2021, at 19:11)Reminds me of when there was a big debate between RESTful and non-RESTful stuff. It seems we've gone past that but maybe not in a good way