Senior Director, Site Reliability Engineering (0019-0010)

Job Description: Lead a team of experienced SRE’s (SiteReliability Engineers) that are responsible for deploying, monitoring, troubleshooting and develop tooling for Insulet Cloud solutions. Staff and manage the Incident Management Team responsible for Incident response, analysis, containment, recovery, and post-incident root cause analysis. Apply strategic engineering and design along with hands-on, technical work. Configure, tune, and tackle multi-tiered systems to achieve optimal application performance, stability, and availability. Work closely with systems engineers, development and operations teams, and information security teams. Responsible for implementing a 24x7 monitoring and observability capability team. Position is fixed location based in Acton, MA office; however, telecommuting from a home office location may also be allowed. 15% Travel, which includes domestic travel for business and technology conferences and planning meetings, as well as international travel for business meetings and planning purposes. Education and Experience Requirements Requires a Bachelor’s degree (or foreign equivalent) in Applied Computer Science, Information Systems, or a directly related field plus seven (7) years of software development, DevOps, or SRE experience. Five (5) years of experience in the following (experience may be gained concurrently): - DevOps/SRE, systems engineering, build/release/deployment automation, including: design, implement, and maintain CI/CD pipelines for automated build, test, and deployment across multiple environments. - Experience with development software tooling to deliver programmable infrastructure & environments and building CI/CD pipeline with tools including Terraform and CloudFormation. - Design and implement secure, highly available, and scalable architectures using AWS services such as VPC, EC2, ECS, Lambda, ELB, and Route 53. - Experience with software development life cycle (SDLC) and agile/iterative methodologies in order to integrate reliability practices into development. - Participate in all phases of SDLC: requirements, design, development, testing, deployment, and maintenance. - Experience with monitoring solutions, including Datadog and PagerDuty. - Design, deploy, and manage cloud-based compute resources (VMs, containers, and serverless functions) for scalable and resilient applications. - Design and maintain highly available, scalable cloud architectures across compute, storage, and database services. - Implement automated provisioning, configuration management, and infrastructure-as-code (IaC) using tools like Terraform, CloudFormation, or Ansible. - Optimize resource utilization and cost through autoscaling, performance tuning, and capacity planning. - Implement secure communication protocols (HTTPS, SSL/TLS) and manage certificates for production environments. - Configure and troubleshoot VPCs, subnets, routing tables, and load balancers to ensure reliable service connectivity. Telecommuting/Remote Work is Permitted Telecommuting from a home office location may also be allowed. Please copy and paste your resume in the email body (do not send attachments, we cannot open them) and email it to candidates at placementservicesusa.com with reference #0019-0010 in the subject line. Thank you.