What to bake into an AWS AMI and what to provision using cloud-init?

I'm using AWS Cloudformation to setup numerous elements of network infrastructure (VPCs, SecurityGroups, Subnets, Autoscaling groups, etc) for my web application. I want the whole process to be automated. I want click a button and be able to fire up the whole thing.

I have successfully created a Cloudformation template that sets up all this network infrastructure. However the EC2 instances are currently launched without any needed software on them. Now I'm trying to figure out how best to get that software on them.

To do this, I'm creating AMIs using Packer.io. But some people have instead urged me to use Cloud-Init. What heuristic should I use to decide what to bake into the AMIs and/or what to configure via Cloud-Init?

For example, I want to preconfigure an EC2 instance to allow me (saqib) to login without a password from my own laptop. Thus the EC2 must have a user. That user must have a home directory. And in that home directory must live a file .ssh/known_hosts containing encrypted codes. Should I bake these directories into the AMI? Or should I use cloud-init to set them up? And how should I decide in this and other similar cases?

Answers 1

  • I like to separate out machine provisioning from environment provisioning.

    In general, I use the following as a guide:

    Build Phase

    • Build a Base Machine Image with something like Packer, including all software required to run your application. Create an AMI out of this.
    • Install the application(s) onto the Base Machine Image creating an Application Image. Tag and version this artifact. Do not embed environment specific stuff here like database connections etc. as this precludes you from easily reusing this AMI across different environment runtimes.
    • Ensure all services are stopped

    Release Phase

    • Spin up an environment consisting of the images and infra required, using something like CFN.
    • Use Cloud-Init user-data to configure the application environment (database connections, log forwarders etc.) and then start the applications/services

    This approach gives the greatest flexibility and cleanly separates out the various concerns of a continuous delivery pipeline.

Related Articles