Security Level: CRJMC
Must Have:
- Design, provision, and manage AWS infrastructure including VPCs, subnets, security groups, IAM policies, EC2, ECS, EKS, RDS, S3, Route 53, and CloudFront.
- Architect multi-account AWS environments following AWS Well-Architected Framework principles.
- Manage AWS cost optimization strategies including Reserved Instances, Savings Plans, and rightsizing.
- Develop, maintain, and refactor Terraform modules and configurations for all cloud infrastructure.
- Author and maintain Ansible playbooks, roles, and collections for server configuration, application deployment, and compliance enforcement.
- Operate and administer Red Hat OpenShift Service on AWS (ROSA) clusters, including cluster upgrades, node scaling, and add-on management.
- Design and maintain CI/CD pipelines (GitLab CI, Azure DevOps Service) for infrastructure and application delivery.
Experience and Skill Set Requirements
1. Cloud Infrastructure & AWS
· Design, provision, and manage AWS infrastructure including VPCs, subnets, security groups, IAM policies, EC2, ECS, EKS, RDS, S3, Route 53, and CloudFront.
· Architect multi-account AWS environments following AWS Well-Architected Framework principles.
· Manage AWS cost optimization strategies including Reserved Instances, Savings Plans, and rightsizing.
· Implement and maintain CloudTrail, Config, GuardDuty, Security Hub, and AWS Organizations SCPs.
2. Infrastructure as Code — Terraform/Terraform Cloud
· Develop, maintain, and refactor Terraform modules and configurations for all cloud infrastructure.
· Manage Terraform Cloud workspaces, remote state backends, variable sets, and team access policies.
· Enforce IaC standards including module versioning, input/output conventions, and documentation.
· Implement drift detection and remediation workflows using Terraform Cloud run tasks and policy-as-code (Sentinel or OPA).
· Lead Terraform code review processes and mentor junior team members on best practices.
3. Configuration Management — Ansible
· Author and maintain Ansible playbooks, roles, and collections for server configuration, application deployment, and compliance enforcement.
· Manage Ansible inventories across dynamic cloud environments using AWS dynamic inventory plugins.
· Integrate Ansible automation with CI/CD pipelines for repeatable and auditable deployments.
· Use Ansible Vault for secrets management and always ensure secure handling of credentials.
· Develop idempotent, well-tested automation that reduces manual toil and configuration drift.
4. Container Platform — OpenShift ROSA
· Operate and administer Red Hat OpenShift Service on AWS (ROSA) clusters, including cluster upgrades, node scaling, and add-on management.
· Define and enforce OpenShift RBAC, NetworkPolicies, and SecurityContextConstraints (SCCs).
· Manage Operators, Helm charts, and Kustomize overlays for workload deployment on ROSA.
· Ensure cluster hardening against CIS benchmarks and organizational security policies.
5. CI/CD Pipelines
· Design and maintain CI/CD pipelines (GitLab CI, Azure DevOps Service) for infrastructure and application delivery.
· Implement GitOps workflows using ArgoCD for declarative, auditable deployments to OpenShift ROSA.
· Integrate security scanning tooling (SAST, container scanning, dependency auditing) into pipeline gates.
· Champion shift-left testing principles, ensuring infrastructure changes are validated before promotion to production.
· Maintain pipeline-as-code standards with versioned, peer-reviewed pipeline definitions.
6. Security & Compliance
· Serve as a key contributor to the team's security posture, embedding security controls throughout the infrastructure and CI/CD lifecycle.
· Implement secrets management solutions (AWS Secrets Manager) and enforce least-privilege access.
· Support vulnerability management processes by triaging findings from infrastructure and container scanning tools.
· Participate in incident response and post-mortem processes, ensuring remediation actions are tracked and resolved.
7. Observability & Reliability
· Build and maintain end-to-end observability solutions using AWS CloudWatch.
· Define and track SLOs and SLIs for critical platform services and workloads.
· Lead on-call incident response for platform-level issues, conducting RCAs and driving permanent fixes.
· Produce and maintain runbooks and architectural decision records (ADRs).