A Better Way to SSH in AWS (With RDS tunneling and security automation)
When I first started using AWS environments, the Bastion architecture was prevalent as the way to setup SSH connections. A dedicated "bastion" server is provisioned with SSH ports exposed to an internal network, or in some cases the internet, so that other servers do not have to expose their own SSH ports. Sometimes, the bastion host is used to tunnel to databases or other more sensitive ports as well, though I generally prefer to chain SSH -> bastion -> application server -> DB/etc.
While this method is good because it reduces the attack surface area and gives a single point of control, it also increases overall cost of maintenance and results in a pretty risky server.
In 2019, AWS announced tunneling support for SSH and SCP with Systems Manager, meaning that Bastion hosts can be replaced for most use cases. We can also pick up a couple of extra security goodies when moving to systems manager:
- Automated server patching
- Enforced security standards on OS level hardening or agent installs
- Full SSH session logging is simple to enable (I actually recommend disabling this unless you really need it to avoid storing sensitive information in these logs)
In this article, we'll be walking through an initial SSM setup, testing SSH to an EC2 instance along with a tunnel to RDS, and then configuring automated patching and security checks for that instance.
The templates shown in this article don't depend on other templates in my Advanced AWS security architecture series, but you might be interested in reading the first article before taking on this one.
The full CloudFormation template for deploying a SystemsManager enabled instance with a sample automation document can be found on my GitHub.
Initial SSM setup
In order to leverage SSM, we need a few things:
- An Instance profile we can attach to EC2 instances
- A Role that can assume permissions for SSM tasks
- An SSM agent installed and running on the EC2 instance.
Here is a CloudFormation snippet to create an instance profile and a role that allows the EC2 instance to leverage SSM:
SSMProfile:
DependsOn: SSMRole
Type: AWS::IAM::InstanceProfile
Properties:
InstanceProfileName: SSMInstanceProfile
Roles:
- !Ref SSMRole
SSMRole:
Type: AWS::IAM::Role
Properties:
AssumeRolePolicyDocument:
Version: 2012-10-17
Statement:
- Effect: Allow
Principal:
Service:
- ec2.amazonaws.com
Action:
- 'sts:AssumeRole'
Description: Basic SSM permissions for EC2
ManagedPolicyArns:
- arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore
RoleName: SSMInstanceProfile
I am using a managed policy for simplicity, but I recommend creating your own policy with limited permissions instead, as most managed policies are too permissive.
Automating Security Tasks with SSM
Now that we have SSM available for instances, let's create a sample script that we would like to run on a regular basis across all our EC2 instances. To keep the example simple, we will write an Echo automation, which accepts a parameter and echos it into a local text file on the server.
Since I am using a shell script in this example, you could modify this template to do anything on the server. In the past I have used this method to install or verify installation of security agents, setup logging, audit software, and more.
First we will create a Document, which defines a single parameter and our shell script (a couple of echo commands).
EchoDocument:
Type: "AWS::SSM::Document"
Properties:
Name: "SecurityEchoDocument"
DocumentType: Command
Content:
schemaVersion: "1.2"
description: "Just echo's into a file - to show how SSM works. A real document might check security agents, setup logging, or hardening attributes."
parameters:
valueToEcho:
type: "String"
description: "Just a sample parameter"
default: "Hello world!"
runtimeConfig:
aws:runShellScript:
properties:
- runCommand:
- echo "{{ valueToEcho }}" >> ssm.txt
- echo "Done with SSM run" >> ssm.txt
Next we will create a maintenance window that defines how frequently and when this should be run. For testing purposes, I want this to be run every 5 minutes:
EchoWindow:
Type: AWS::SSM::MaintenanceWindow
Properties:
AllowUnassociatedTargets: true
Cutoff: 1
Description: Run Echo documents - our sample automation
Duration: 4
Name: PatchWindow
Schedule: cron(*/5 * * * ? *) # Every 5 mintues for this test. Probably not what you would really want!
Then, I create a grouping to execute the script on target instances. Here, it is based on tags specifically created for this task. We don't have to use tags, but I find it a simple way to group servers into sets based on automation documents targeting their risk level, OS, or something else.
EchoTargets:
Type: AWS::SSM::MaintenanceWindowTarget
Properties:
Description: Add our server into the maintenance window
Name: EchoTargets
ResourceType: INSTANCE
Targets:
- Key: tag:ShouldEcho
Values:
- True
WindowId: !Ref EchoWindow
Finally, we will tie this all together with a task. The task will link the targets to a window, and execute the document within the window schedule.
EchoTask:
Type: AWS::SSM::MaintenanceWindowTask
Properties:
Description: Echo data on the machine
MaxConcurrency: 3
MaxErrors: 1
Name: EchoTask
Priority: 5
Targets:
- Key: WindowTargetIds
Values:
- !Ref EchoTargets
TaskArn: !Ref EchoDocument
TaskType: RUN_COMMAND
TaskInvocationParameters:
MaintenanceWindowRunCommandParameters:
Parameters:
valueToEcho:
- "Hello World from the maintenance window!"
WindowId: !Ref EchoWindow
We're done with the SSM setup and automation creation now! Any servers tagged with with ShouldEcho == True will now have our Echo script run on them every 5 minutes. But we haven't actually created that server yet, so let's do that next.
Create an EC2 Server With SSM
Let's build that EC2 instance and leverage SSM on it. You should also build a tightly scoped IAM role for this instance. In an enterprise environment, you may have broader groups and scopes that dictate access, so be cautious: It is somewhat easy to over-provision access with this method.
If you grant a role
or ssm:StartSession
ssm:ResumeSession
on resource:*
, then that role will be able to login as root to all SSM enabled servers! (This is a good time to note that when I am peer reviewing IAM templates, any usage of resource:*
gets flagged for close scrutiny: it is rarely what you really want when paired with Allow
directives).
Instead, grant a role with a tightly scoped resource. I tend to use name-spacing, where assets are prefixed with an application ID and environment (Dev/Prod/etc), and then scope with resource:<arn-prefix>-<application-ID>-<environment>-*
. However in this example I don't create any such role and am using an admin user for simplicity.
In the examples, I am using Amazon Linux 2, which comes with an SSM agent installed by default. If you are using an AMI which is does not include the agent, you will need to add a provisioning step to install the agent. More can be found in the AWS documentation on SSM agents.
I also include a security group that does not open port 22 for SSH access. Instead, the server only allows traffic in on port 443, for encrypted HTTP, and all outbound traffic.
SimpleServer:
Type: AWS::EC2::Instance
DependsOn: SSMProfile
Properties:
InstanceType: t3.micro
SecurityGroupIds:
- Ref: WebSecurityGroup
IamInstanceProfile: !Ref SSMProfile
ImageId: !Ref AMIID
Tags:
- Key: ShouldEcho
Value: True
SsmAssociations:
- AssociationParameters:
- Key: valueToEcho
Value:
- "Hello World from CloudFormation initialization!"
DocumentName: !Ref EchoDocument
WebSecurityGroup:
Type: AWS::EC2::SecurityGroup
Properties:
GroupDescription: Enable encrypted HTTP traffic only (in/out)
VpcId: !Ref VpcId
SecurityGroupIngress:
- IpProtocol: tcp
FromPort: '443'
ToPort: '443'
CidrIp: 0.0.0.0/0
Putting It All Together
Our template is now complete. Not shown here, but present in github, is an RDS instance as well. Run it in an AWS environment using the AWS CLI.
aws cloudformation create-stack --stack-name ssm --template-body file://ssm.yaml --capabilities CAPABILITY_NAMED_IAM
You can now use the console to get a few more pieces of information about what was provisioned:
- The instance name, AZ, and other details from the EC2 panel
- The DB master password from the Secret Manager panel
- The RDS postgres instance URL from the RDS panel.
- The Session Manager to view automation documents and start an SSH session from your browser.
Finally, we can use the console to add the group to patch manager. Now, every time the window comes up, AWS will also try to patch the instances with the latest security patches.
A Better way to SSH on AWS (and tunnel to RDS)
Now that everything is provisioned and you have gathered your information, let's SSH into our simple server, even though we didn't open port 22. In the console, navigate to systems manager, then Session Manager, and select the instance, then click start session. You'll get a console window with root access!
Alternately, and my preference, you can use the command line by installing a plugin to the AWS CLI and following the AWS guide on SSH. Here, I SSH into the instance and verify that the automation Echo document we created is being executed and passed the parameters we defined.
aws ssm start-session --target i-01cbc9a20ce113029 --document-name AWS-StartSSHSession
...
sh-4.2$ ls -al ssm.txt
-rw-r--r-- 1 root root 132 Feb 19 17:16 ssm.txt
sh-4.2$ id
uid=1001(ssm-user) gid=1001(ssm-user) groups=1001(ssm-user)
sh-4.2$ cat ssm.txt
Hello World from CloudFormation initialization!
Done with SSM run
Hello World from CloudFormation initialization!
Done with SSM run
.... wait ~5 minutes...
sh-4.2$ cat ssm.txt
Hello World from CloudFormation initialization!
Done with SSM run
Hello World from CloudFormation initialization!
Done with SSM run
Hello World from the maintenance window!
Done with SSM run
We can also leverage SSM to port forward from our local machine to an RDS instance that is only accessible to the EC2. SSM does require a bit of extra work to get the tunnel working unfortunately.
To complete this example, you will need the AWS CLI and SSM plugin, a local postgres client (psql), and an SSH client. You can get the DB password by logging into the AWS console and retrieving the secret from the Secrets Manger service.
Below is a script that does a few things to setup our tunnel to the RDS instance:
- Temporarily (for 60 seconds) puts a public key on the EC2 instance (it creates a temporary keypair in the current directory)
- Connect to the instance using the private key, and put the tunnel in a socket file (temp-ssh.sock)
- Wait for the user to press a key, then close the connection.
ssh-keygen -t rsa -f temp -N ''
aws ec2-instance-connect send-ssh-public-key --instance-id i-07cec3c515bcb2e61 --availability-zone us-east-1b --instance-os-user ssm-user --ssh-public-key file://temp.pub
ssh -i temp -N -f -M -S temp-ssh.sock -L 3306:echodb-dev.cju92986bx4i.us-east-1.rds.amazonaws.com:5432 ssm-user@i-07cec3c515bcb2e61 -o "UserKnownHostsFile=/dev/null" -o "StrictHostKeyChecking=no" -o ProxyCommand="aws ssm start-session --target %h --document-name AWS-StartSSHSession --parameters portNumber=%p"
read -rsn1 -p "Press any key to close session."; echo
ssh -O exit -S temp-ssh.sock *
rm temp*
Of course, this script leaves a lot to be desired - it hard codes the instance name, keyfile, AZ, etc. It should be used as a starting point for a more robust script. Once the press any key message appears, in a separate window we can connect to our instance with psql:
$ psql -h localhost -p 3306 -U master postgres
Password for user master:
psql (12.2 (Ubuntu 12.2-1.pgdg19.10+1), server 10.6)
SSL connection (protocol: TLSv1.2, cipher: ECDHE-RSA-AES256-GCM-SHA384, bits: 256, compression: off)
Type "help" for help.
postgres=> \q
Hitting any key on the original window will close the session and remove the socket file.
Conclusions
We have built a complete management solution for a typical EC2/RDS architecture without exposing any SSH ports. We restricted the database to only allow connections from servers that will interact with it (no bastion!). We also setup an automation document that can be expanded upon to complete all sorts of automated security tasks on our server fleet.
This gives us improved SSH security at a lower cost and overall simpler architecture and security group layout.
I'd love to hear from the community on what great automation you have done with SSM!