Temps de lecture : 10 minutes
This blog post is based on my talk at the Community Day Dach in Munich. In my daily work, I often run into a frustrating problem. When I need to debug or understand how an application works, I naturally go to CloudWatch. However, the different parts of a project are often segregated into different AWS accounts. So, I go back to the SSO portal and access this new account.
But very frequently I need to go back to the first account, which is still open in a tab of my browser, but, oh no, I have been disconnected from the account… Annoying, isn’t it?
Now that we’ve discussed the problem, let’s see how we can access all our log data without being disconnected.
The Solution: CloudWatch OAM
While third-party solutions like Datadog and Dynatrace exist, they come with costs and require leaving the AWS environment. I will introduce you to a relatively unknown and underutilized AWS feature: CloudWatch Observability Access Manager (OAM).
Source:
Before we begin, I’d like to review the basics of observability, the three pillars of which are:
- Metrics: These numerical data points tell you how your systems and applications perform. Examples include CPU utilisation, memory usage, and request latency.
- Logs: These are records of events in your systems and applications. They provide detailed information about what happened, when it happened, and where it happened.
- Traces: These track the flow of requests through your applications, helping you understand how different components interact and identify performance bottlenecks.
These three pillars are valid for all applications and infrastructures, not just on AWS. However, in addition to these three pillars, we find two other types of data that CloudWatch OAM can manage:
- Application Insights: This provides in-depth monitoring for your applications, collecting data like request rates, error rates, and response times.
- Internet Monitor: This monitors the availability and performance of your applications from the perspective of your end users, giving you insights into how they experience your services.
Seeing this article as an excellent opportunity to introduce you to another deployment method simultaneously, I will introduce you to HCP Terraform, the version of Terraform managed by HashiCorp with lots of little extras that don’t exist in the community version. You could compare it to what AWS is to on-premises infrastructure.
Why HCP Terraform?
HCP Terraform is a managed Terraform offering by HashiCorp. Think of it as the AWS of the Terraform world. It provides several advantages over the community version, such as:
- Managed infrastructure: No need to manage your own Terraform infrastructure. HCP takes care of that for you.
- Enhanced collaboration: Improved collaboration features make it easier for teams to work together.
- Integration with other HashiCorp tools: Seamless integration with Vault, Consul, and Nomad, expanding your infrastructure management capabilities.
Understanding the Implementation
Before we start the implementation, I’ll explain the complete setup to help you understand why we’re doing it this way.
We’re going to create a Terraform stack that will be used to deploy our AWS organisation. We could do this with AWS Account Factory for Terraform (AFT) or Control Tower, but these methods each have their own advantages and disadvantages. So, we will manually create our first project for our master account, which will be connected to a stack hosted on a GitHub repo. This Terraform stack will deploy (meaning create) our HCP Terraform projects on HCP and create our AWS accounts, each of which will be linked to its own HCP project. We could create a single HCP project, but with an actual enterprise organisation that can contain 100 or even 1000 AWS accounts, the Terraform stack would quickly become unmanageable. However, if we have one HCP project per account, we’ll only have a single Terraform codebase that will be hosted on a repo. Here’s a diagram:
As you can see, the organisation stack will also create the observability account, but it won’t have its own HCP project. The account will be managed directly by the organisation stack.
Before we continue, let’s look at how Terraform Cloud can access our AWS account to create and modify resources. To do this, we’ll need an AWS role, which Terraform will assume. Since we’ll need this role on every account, we need to automate this, and the simplest solution is to create a StackSet. Here’s a summary:
Once we’ve created this role, each Terraform stack will be able to create the CloudWatch links on the child accounts. Remember that CloudWatch is regional, so we need to create one per region.
Implementation
Now, let’s move on to the implementation. We need two GitHub repositories:
- aws-org: This will contain our Terraform code for our AWS organisation.
- aws-org-childs-accounts: This will contain our Terraform code for the landing zone of each child account.
Enabling Terraform to Deploy AWS Resources with an Identity Provider
We need to manually create the identity provider in our main account to allow Terraform to assume our role:
The documentation with all the details is here:
After that, we move on to creating the role with the policy:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "*",
"Resource": "*"
}
]
}
Finally, the Trust Relationship. Again, it is deliberately quite broad, so feel free to reduce the scope according to your needs.
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Federated": "arn:aws:iam::12345678912:oidc-provider/app.terraform.io"
},
"Action": "sts:AssumeRoleWithWebIdentity",
"Condition": {
"StringEquals": {
"app.terraform.io:aud": "aws.workload.identity"
},
"StringLike": {
"app.terraform.io:sub": "organization:YOUR_ORG:project:*:workspace:*:run_phase:*"
}
}
}
]
}
Now, we can move on to the HCP Terraform part. To do this, you will need an account, an organisation, a project, and a workspace. I’ll let you create all of that; their UI is very well-guided. Once all that is created, you’ll need to go into the project settings to tell HCP Terraform about the role we just created. We do this through environment variables by creating TFC_AWS_PROVIDER_AUTH & TFC_AWS_RUN_ROLE_ARN. TFC_AWS_PROVIDER_AUTH should be set to « true » while you identify your role ARN in the second.
Getting Started with our organization Terraform stack
Now, let’s create our Terraform stack and the code. If you selected CLI to launch the runs when you created your workspace, you can start by adding the backend. This is similar to what we usually do, but instead of specifying an S3 bucket, we’re going to specify a Terraform Cloud project:
terraform {
cloud {
organization = "filol-tf-org"
workspaces {
name = "aws-organisation"
}
}
}
Once that’s done, let’s simply create some AWS accounts:
resource "aws_organizations_account" "fradex_tgtg_dev" {
name = "fradex-tgtg-dev"
email = "francois@d2si.io"
close_on_deletion = true
parent_id = aws_organizations_organizational_unit.projects.id
}
resource "aws_organizations_account" "fradex_babar_dev" {
name = "fradex-babar-dev"
email = "francois@revolve.team"
close_on_deletion = true
parent_id = aws_organizations_organizational_unit.projects.id
}
resource "aws_organizations_account" "observability" {
name = "observability"
email = "francois@devoteam.com"
close_on_deletion = true
parent_id = data.aws_organizations_organization.this.roots[0].id
}
Automating IAM Role Creation with CloudFormation
As we mentioned earlier, we need to automate the creation of a role on each account so that Terraform can create resources on AWS. For this, we chose a CloudFormation StackSet solution.
resource "aws_cloudformation_stack_set_instance" "main" {
deployment_targets {
organizational_unit_ids = [
data.aws_organizations_organization.this.roots[0].id
]
}
region = "eu-west-1"
stack_set_name = aws_cloudformation_stack_set.main.name
}
resource "aws_cloudformation_stack_set" "main" {
permission_model = "SERVICE_MANAGED"
name = "main"
capabilities = ["CAPABILITY_NAMED_IAM", "CAPABILITY_IAM"]
auto_deployment {
enabled = true
}
template_body = file(
"cf-iam/template.yaml"
)
}
And here’s the template:
AWSTemplateFormatVersion: '2010-09-09'
Resources:
IAMOIDCProvider00oidcproviderappterraformio00JEAcz:
Type: "AWS::IAM::OIDCProvider"
Properties:
ClientIdList:
- "aws.workload.identity"
ThumbprintList:
- "9e99a48a9960b14926bb7f3b02e22da2b0ab7280"
Url: "
MyIAMRole:
Type: 'AWS::IAM::Role'
Properties:
RoleName: hcp-terraform
AssumeRolePolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Principal:
Federated: !Sub 'arn:aws:iam::${AWS::AccountId}:oidc-provider/app.terraform.io'
Action: 'sts:AssumeRoleWithWebIdentity'
Condition:
StringEquals:
'app.terraform.io:aud': 'aws.workload.identity'
StringLike:
'app.terraform.io:sub': 'organization:filol-tf-org:project:*:workspace:*:run_phase:*'
Policies:
- PolicyName: AdministratorAccessPolicy
PolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Action: '*'
Resource: '*'
Outputs:
IAMRoleArn:
Description: 'The ARN of the created IAM Role'
Value: !GetAtt MyIAMRole.Arn
For this project, and to show you the diversity of HCP Terraform’s capabilities, I’m not using the same deployment mode for the child accounts. We’re going to set up a GitHub trigger to HCP Terraform. This means that as soon as we have a change on GitHub, the HCP Terraform stack will be automatically triggered to make the changes. This is similar to what we could have done with a CI/CD pipeline, but there’s almost nothing to do here.
resource "tfe_oauth_client" "filol-tf-org" {
name = "my-github-oauth-client"
organization = data.tfe_organization.filol-tf-org.name
api_url = "
http_url = "
oauth_token = var.github_oauth_token
service_provider = "github"
}
Automating Terraform Workspace Management with Terraform
Now that all that’s done let’s create all our Terraform workspaces in a few lines:
resource "tfe_workspace" "accounts" {
for_each = local.foreach_childs_accounts
organization = "filol-tf-org"
project_id = "prj-fakeidwarning"
name = "${each.value.id}-${each.value.name}"
auto_apply = true
auto_apply_run_trigger = true
vcs_repo {
identifier = "filol/aws-org-childs-accounts"
ingress_submodules = true
oauth_token_id = tfe_oauth_client.filol-tf-org.oauth_token_id
}
}
When I modify my organisation stack, I also want to retrigger all the child account stacks because I will create dependencies between these two accounts later.
resource "tfe_run_trigger" "accounts" {
for_each = local.foreach_childs_accounts
workspace_id = tfe_workspace.accounts[each.key].id
sourceable_id = data.tfe_workspace.this.id
}
Of course, we must also remember to create our variables in each workspace with the roles that will be created by our StackSet.
resource "tfe_variable" "TFC_AWS_PROVIDER_AUTH" {
for_each = local.foreach_childs_accounts
key = "TFC_AWS_PROVIDER_AUTH"
value = "true"
category = "terraform"
workspace_id = tfe_workspace.accounts[each.key].id
}
resource "tfe_variable" "TFC_AWS_RUN_ROLE_ARN" {
for_each = local.foreach_childs_accounts
key = "TFC_AWS_RUN_ROLE_ARN"
value = "arn:aws:iam::${each.value.id}:role/hcp-terraform"
category = "terraform"
workspace_id = tfe_workspace.accounts[each.key].id
}
I’ll let you run Terraform to create our accounts and different projects.
Once that’s done, we can do a quick test and check that everything works. I make a change to my org stack:
And as soon as it’s finished, I can see that all the other stacks are being updated:
Once that’s done, let’s move on to configuring our AWS account, which is dedicated to observability. To do this, we need to configure two other AWS providers:
provider "aws" {
region = "us-east-1"
assume_role {
role_arn = "arn:aws:iam::${aws_organizations_account.observability.id}:role/OrganizationAccountAccessRole"
}
alias = "observability_us-east-1"
}
provider "aws" {
region = "eu-west-1"
assume_role {
role_arn = "arn:aws:iam::${aws_organizations_account.observability.id}:role/OrganizationAccountAccessRole"
}
alias = "observability_eu-west-1"
}
Implementing our Cloudwatch Data Receiver : Sink
Let’s configure our CloudWatch service on this account to allow it to receive data from our entire AWS organisation:
resource "aws_oam_sink" "central_logging_sink" {
provider = aws.observability_us-east-1
name = "central-logging-sink-org"
}
resource "aws_oam_sink" "central_logging_sink_eu-west-1" {
provider = aws.observability_eu-west-1
name = "central-logging-sink-org"
}
resource "aws_oam_sink_policy" "central_logging_sink_policy" {
provider = aws.observability_us-east-1
sink_identifier = aws_oam_sink.central_logging_sink.id
policy = local.sink_policy
}
resource "aws_oam_sink_policy" "central_logging_sink_policy_eu-west-1" {
provider = aws.observability_eu-west-1
sink_identifier = aws_oam_sink.central_logging_sink_eu-west-1.id
policy = local.sink_policy
}
You’ll find the policy used below:
locals {
sink_policy = <<-EOT
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": "*",
"Action": ["oam:CreateLink", "oam:UpdateLink"],
"Resource": "*",
"Condition": {
"ForAllValues:StringEquals": {
"oam:ResourceTypes": ["AWS::Logs::LogGroup", "AWS::CloudWatch::Metric", "AWS::XRay::Trace", "AWS::ApplicationInsights::Application"]
},
"ForAnyValue:StringEquals": {
"aws:PrincipalOrgID": "${data.aws_organizations_organization.this.id}"
}
}
}
]
}
EOT
}
Implementing our Cloudwatch Data Receiver : Link
Once this is done, we can go to our Terraform stack that manages all the child accounts and create the resource that will send the data to our observability account:
resource "aws_oam_link" "oam_source_link" {
sink_identifier = var.central_logging_sink
label_template = var.account_name
resource_types = ["AWS::Logs::LogGroup", "AWS::CloudWatch::Metric", "AWS::XRay::Trace", "AWS::ApplicationInsights::Application"]
}
resource "aws_oam_link" "oam_source_link_eu-west-1" {
provider = aws.aws_eu-west-1
sink_identifier = var.central_logging_sink_eu_west_1
label_template = var.account_name
resource_types = ["AWS::Logs::LogGroup", "AWS::CloudWatch::Metric", "AWS::XRay::Trace", "AWS::ApplicationInsights::Application"]
}
And here’s the result:
We notice that a toast appears in the top right, indicating this is a global observability account. We are receiving data from different accounts despite the same log group name in the logs.
Cost of this solution
Don’t worry about the cost of monitoring; it’s free! You can find this information in the AWS documentation:
“Cross-account observability comes with no extra cost for logs and metrics. CloudWatch delivers the first trace copy stored in the first monitoring account with no extra cost. Any trace copies sent to additional monitoring accounts are billed to the source accounts for the traces recorded based on AWS X-Ray pricing. Standard CloudWatch rates apply for the features used in monitoring accounts, such as CloudWatch Dashboards, Alarms, or Logs Insights queries.”
Source : Oct – 2024
Conclusion
To conclude, we have seen a free way to regroup our observability data across all of our accounts without an external tool, thanks to CloudWatch OAM. With that, we are able to quickly start enterprise-grade monitoring and plug in some automations based on this.
We just need to create a sink, also known as a receiver, inside an account that will become the global observability account. As with any common AWS resource, we need to attach a policy to allow external resources (or accounts here) to send data; we can allow the whole organization with a keyword. And finally, create a link inside each account from which we want to send the data. As it’s regionally based, we need to do that for each region.