Breaking Circular Dependencies: The Hidden Cost of Terraform Security Group Refactoring
Sometimes the best solution to a problem creates a new problem you didn’t expect. This is a story about fixing one Terraform error, only to discover that the fix itself introduces a whole new class of deployment challenges.
The Setup
We had a straightforward architecture: an Application Load Balancer (ALB) forwarding traffic to an ECS service running our API. The security groups were configured to allow traffic flow between them. Nothing fancy, just standard AWS infrastructure.
Then came the Terraform validation error:
Error: Cycle: aws_security_group.alb_sg, aws_security_group.api_service_sg
A circular dependency. The ALB security group referenced the ECS security group, and vice versa. Terraform couldn’t determine which one to create first.
The Problem: Circular Dependencies
Here’s what the original code looked like:
# ALB Security Group
resource "aws_security_group" "alb_sg" {
vpc_id = var.VPC_ID
name = "${var.project}_alb_sg_${local.namespace}"
# Egress to ECS service
egress {
description = "Forward traffic to ECS service on port 3000"
from_port = 3000
to_port = 3000
protocol = "tcp"
security_groups = [aws_security_group.api_service_sg.id] # ← References ECS SG
}
}
# ECS Service Security Group
resource "aws_security_group" "api_service_sg" {
vpc_id = var.VPC_ID
name = "${var.project}_api_service_sg_${local.namespace}"
# Ingress from ALB
ingress {
description = "Allow traffic from ALB SG on port 3000"
from_port = 3000
to_port = 3000
protocol = "tcp"
security_groups = [aws_security_group.alb_sg.id] # ← References ALB SG
}
}
The cycle is clear:
- ALB security group needs the ECS security group ID for its egress rule
- ECS security group needs the ALB security group ID for its ingress rule
- Terraform: “I can’t create either one first!” 🤯
Visualizing the Circular Dependency
The Standard Solution: Separate Security Group Rules
This is a well-documented pattern in the Terraform community. Instead of defining rules inline within the security group resource, you extract them into separate aws_security_group_rule resources:
# ALB Security Group (no inline rules)
resource "aws_security_group" "alb_sg" {
vpc_id = var.VPC_ID
name = "${var.project}_alb_sg_${local.namespace}"
# No egress rules defined inline
}
# ECS Service Security Group (no inline rules)
resource "aws_security_group" "api_service_sg" {
vpc_id = var.VPC_ID
name = "${var.project}_api_service_sg_${local.namespace}"
# No ingress rules defined inline
}
# Separate rule: ALB → ECS egress
resource "aws_security_group_rule" "alb_egress_to_ecs" {
type = "egress"
description = "Forward traffic to ECS service on port 3000"
from_port = 3000
to_port = 3000
protocol = "tcp"
security_group_id = aws_security_group.alb_sg.id
source_security_group_id = aws_security_group.api_service_sg.id
}
# Separate rule: ECS ← ALB ingress
resource "aws_security_group_rule" "ecs_ingress_from_alb" {
type = "ingress"
description = "Allow traffic from ALB SG on port 3000"
from_port = 3000
to_port = 3000
protocol = "tcp"
security_group_id = aws_security_group.api_service_sg.id
source_security_group_id = aws_security_group.alb_sg.id
}
Why this works:
- Both security groups are created first (with no rules)
- Then the separate rule resources are created
- The rules can reference both security groups because they already exist
- No circular dependency!
The Fixed Architecture
Perfect! We committed the fix, merged to develop, and triggered the deployment pipeline.
Then came the error that prompted this entire investigation.
The New Problem: Duplicate Rules
Error: [WARN] A duplicate Security Group rule was found on (sg-0123456789abcdef0).
Error: operation error EC2: AuthorizeSecurityGroupIngress,
https response error StatusCode: 400, RequestID: 34a71c7a-d5ee-464c-aa7a-cd9c70bcd8f6,
api error InvalidPermission.Duplicate: the specified rule
"peer: sg-0fedcba9876543210, TCP, from port: 3000, to port: 3000, ALLOW"
already exists
with aws_security_group_rule.ecs_ingress_from_alb,
on service.tf line 79, in resource "aws_security_group_rule" "ecs_ingress_from_alb":
79: resource "aws_security_group_rule" "ecs_ingress_from_alb" {
Wait, what? The rule already exists? But we just defined it as a new resource!
What Actually Happened
Here’s the thing about inline security group rules versus separate aws_security_group_rule resources: they both create the same thing in AWS.
When you define a rule inline:
resource "aws_security_group" "example" {
ingress {
from_port = 3000
to_port = 3000
protocol = "tcp"
security_groups = [aws_security_group.other.id]
}
}
AWS creates a security group rule. Terraform manages it as part of the security group resource.
When you define a rule separately:
resource "aws_security_group_rule" "example" {
security_group_id = aws_security_group.example.id
from_port = 3000
to_port = 3000
protocol = "tcp"
source_security_group_id = aws_security_group.other.id
}
AWS creates… the exact same security group rule. Terraform manages it as a separate resource.
The problem: When we refactored from inline to separate rules, the actual rules already existed in AWS (created by the inline configuration). Our new code tried to create them again as separate resources, and AWS said “nope, those rules already exist!”
The State Management Issue
This is fundamentally a Terraform state migration problem, not an AWS problem. Let’s trace what happened:
The state file still tracks the rules as part of the security group resources (inline), but the new code defines them as separate resources. Terraform doesn’t realize they’re the same thing.
Attempted Solution #1: Import Blocks
My first instinct was to use Terraform’s import blocks (available in Terraform 1.2+). The idea was to tell Terraform: “Hey, these separate rule resources you’re trying to create? They already exist. Just import them into state.”
import {
to = aws_security_group_rule.ecs_ingress_from_alb
id = "${aws_security_group.api_service_sg.id}_ingress_tcp_3000_3000_${aws_security_group.alb_sg.id}"
}
resource "aws_security_group_rule" "ecs_ingress_from_alb" {
# ... configuration ...
}
Elegant! Declarative! Should work perfectly, right?
Why Import Blocks Failed
Problem #1: Circular Dependency (Again!)
The import block ID references both security groups:
- File A’s import block references
aws_security_group.alb_sg.id(from File B) - File B’s import block references
aws_security_group.api_service_sg.id(from File A)
We’re back to a circular dependency! The very problem we were trying to fix.
Attempted Fix: Use Data Sources
data "aws_security_group" "existing_alb_sg_for_import" {
name = "${var.project}_alb_sg_${local.namespace}"
}
import {
to = aws_security_group_rule.ecs_ingress_from_alb
id = "${data.aws_security_group.existing_ecs_sg.id}_ingress_tcp_3000_3000_${data.aws_security_group.existing_alb_sg.id}"
}
This broke the circular dependency by using independent data source lookups instead of resource references.
Problem #2: Import Blocks Don’t Support Computed Values
Error: cannot use computed values in import block ID
Terraform’s import blocks require literal string values known at plan time. You can’t use:
- Data source attributes (computed at apply time)
- Resource attributes (computed at apply time)
- Any interpolation that isn’t a simple variable
The import ID must be a hardcoded string or a simple variable. No dynamic lookups allowed.
The Cursor Bot’s Helpful Comment
When I opened a PR with the import block solution, Cursor’s bot immediately flagged it:
Bug: Cyclic Imports Break Terraform Plan
The import block creates a circular dependency with the import block in
load_balancer.tf. This import referencesaws_security_group.alb_sg.idfrom the load balancer file, while that file’s import referencesaws_security_group.api_service_sg.idfrom this file. Terraform will fail with a cycle error when evaluating these interdependent import block IDs during the plan phase.
And after trying the data source approach:
The import block uses data source attributes in the id field, but Terraform import blocks cannot use computed values - they require literal strings or values known at plan time. This will cause a “cannot use computed values” error during terraform plan.
Props to the bot for catching these issues before they hit the actual deployment! 🤖
The Real Solution: Manual State Migration
After all the attempts to automate this with import blocks, the reality is simpler (and somewhat anticlimactic): just handle the one-time migration manually.
You have two options:
Option 1: Manual Deletion (Simplest)
This is what I did in the dev environment, and it worked perfectly:
- Open AWS Console → EC2 → Security Groups
- Find the ECS service security group
- Delete the ingress rule from ALB on port 3000
- Find the ALB security group
- Delete the egress rule to ECS on port 3000
- Run
terraform apply- it creates them as separate resources
Time: ~2 minutes
Risk: Zero (rules are immediately recreated)
Downtime: None (rules exist continuously)
Option 2: Manual Import Command
If you prefer the terraform way:
# Look up the security group IDs
terraform state show 'aws_security_group.api_service_sg'
terraform state show 'aws_security_group.alb_sg'
# Import the rules (using actual IDs)
terraform import \
'aws_security_group_rule.ecs_ingress_from_alb' \
'sg-0123456789abcdef0_ingress_tcp_3000_3000_sg-0fedcba9876543210'
terraform import \
'aws_security_group_rule.alb_egress_to_ecs' \
'sg-0fedcba9876543210_egress_tcp_3000_3000_sg-0123456789abcdef0'
# Then apply normally
terraform apply
Why “Just Delete Them” is Actually Fine
I initially hesitated to recommend manual deletion because it felt like working around infrastructure-as-code principles. But here’s why it’s actually the right approach:
1. It’s a One-Time Migration
This isn’t an ongoing operational task. You refactor from inline to separate rules once per security group. After that, everything works normally.
2. Zero Risk
The worst case scenario:
- You delete the rules in AWS
- Terraform apply fails for some reason
- The rules are missing for a few minutes until you debug and reapply
But in reality:
- The apply happens immediately after deletion
- The rules are recreated in seconds
- No actual traffic disruption (connections are established, not rule-checked continuously)
3. It’s Actually Faster
- Manual deletion: 2 minutes
- Setting up import with all variables: 15+ minutes
- Debugging import errors: 30+ minutes
- Writing automation scripts: Hours
4. No Downtime Even If You Don’t Delete
Here’s something important I discovered: if you don’t delete the rules and just try to apply, nothing breaks.
The Terraform apply fails with the duplicate rule error, but:
- ✅ The existing rules stay in place
- ✅ Traffic continues flowing normally
- ✅ No service disruption
- ❌ Just a Terraform error you need to fix
So the “failure” is really just Terraform being unable to complete the apply. Your infrastructure keeps working fine.
This means you can safely:
- Try the apply in production
- See the duplicate error
- Manually delete the rules
- Re-run the apply
No emergency, no incident, no pressure.
Key Takeaways
Circular dependencies in security groups are common - the separate rule pattern is well-established for a reason
Refactoring inline rules to separate resources is a state migration, not just a code change
Import blocks have strict limitations:
- Can’t use computed values
- Can’t use data source attributes
- Can’t reference resource attributes
- Require literal string IDs
Sometimes the manual approach is correct - not everything needs to be automated, especially one-time migrations
Terraform apply failures aren’t always production incidents - in this case, the failure is safe and expected
The “duplicate rule” error has zero impact on running services - your infrastructure keeps working while you fix Terraform’s state
What About Future Refactorings?
The lesson here isn’t “never refactor security groups.” It’s understanding the migration path when you do:
Planning to refactor inline → separate rules?
Document the manual deletion step as part of the deployment plan.Using separate rules from the start?
No migration needed! You avoid this entire problem.Already have inline rules?
Consider whether the circular dependency is actually causing you problems. If not, maybe leave it alone.
References
Wrapping Up
This investigation taught me that not every infrastructure problem has—or needs—an automation solution. Sometimes the best answer is:
- Understand the root cause
- Document the manual steps
- Execute them once per environment
- Move on with your life
The security group rules now work correctly across all environments. The circular dependency is fixed. And I learned some valuable lessons about Terraform’s import block limitations.
Have you hit similar Terraform state migration issues? I’d love to hear how you handled them. Find me on LinkedIn.