[Remote] Senior Site Reliability Engineer

Remote role Full-time Open position

Note: The job is a remote job and is open to candidates in USA. CertifyOS is building the data infrastructure that powers modern healthcare. They are seeking a Senior Site Reliability Engineer who will design for reliability, manage the operational lifecycle, and influence platform architecture and deployment workflows across their systems.

Responsibilities

Designs for reliability, ships the automation, and stands behind it in production
Own the operational lifecycle end-to-end and influence platform architecture, reliability standards, and deployment workflows
Own the full lifecycle of what they support — from infrastructure design and deployment automation through observability, incident response, and postmortems
Improve autoscaling behavior, resource utilization, and workload efficiency across cloud-native distributed systems
Own incident response processes, root cause analysis, escalation workflows, and runbooks
Build and maintain Infrastructure as Code, CI/CD pipelines, and operational tooling that reduce manual work and improve engineering productivity
Instrument data freshness and infrastructure health, not just service uptime

Skills

5+ years in SRE, DevOps, Platform Engineering, or Infrastructure Engineering — operating production systems at scale where your infrastructure is someone else's dependency and failures have real downstream consequences
Track record of improving reliability end-to-end: you've debugged hard production problems, made them not happen again, and built the alerting to prove it
Strong Linux systems administration, incident response, and root cause analysis skills
Comfort influencing operational standards and mentoring teams on reliability practices
Deep hands-on experience with GCP — GKE, Cloud Run, and containerized workloads at scale
Experience building and maintaining Infrastructure as Code with Terraform and/or Pulumi
Fluency across deployment patterns and the judgment to know when each fits: rolling deployments, blue/green, canary — and the rollback story for each
Experience with autoscaling, resource optimization, and infrastructure efficiency for distributed systems
Experience managing infrastructure security, secrets, and access controls in regulated or security-conscious environments
Strong understanding of Golden Signals monitoring — latency, traffic, errors, saturation — and how to make them actionable rather than noisy
Experience designing SLIs, SLOs, error budgets, alerting strategies, dashboards, and escalation workflows
Hands-on experience with observability platforms: Google Cloud Monitoring, Datadog, Grafana, Prometheus, or similar
Strong sense of data platform health: lineage, freshness, and correctness matter as much to you as throughput
Experience building and maintaining CI/CD pipelines using GitHub Actions or similar
Scripting or programming fluency in Python, Bash, Go, or similar — you reduce toil through code, not process
Experience working with Git workflows and modern software delivery practices
Strong written and verbal communication — you can explain an operational risk to an engineer and a product manager in the same conversation
Experience operating systems handling sensitive data or PII in regulated or compliance-adjacent environments
Experience operating large-scale distributed systems or microservices architectures
Familiarity with healthcare, credentialing, or health-tech environments
Experience leveraging AI-assisted observability or incident response tooling
Familiarity with NodeJS, TypeScript, Java, or React application stacks

Benefits

We provide 100% coverage of health, dental, and vision insurance premiums for employees.
Our US-based team benefits from unlimited PTO, with at least two weeks off each year to recharge.
In India, employees are supported with health insurance, statutory leave benefits, and additional wellness (menstrual) leave for women.

Company Overview

CertifyOS is building the future of provider data infrastructure. About five years ago, we set out with a bold aspiration: One API. One provider ID. It was founded in 2021, and is headquartered in New York, New York, USA, with a workforce of 201-500 employees. Its website is https://www.certifyos.com.

Company H1B Sponsorship

CertifyOS has a track record of offering H1B sponsorships, with 2 in 2022, 1 in 2021. Please note that this does not guarantee sponsorship for this specific role.

Apply To This Job

Apply

[Remote] Senior Site Reliability Engineer

Further positions

[Remote] Senior Clinical Research Specialist

[Remote] Talent Research & Sourcing Partner, Marketing, Brand & Communications

[Remote] US_East | Mechanical & Physical Engineer_L4

[Remote] Architectural Services Consultant - Massachusetts, Connecticut or Rhode Island

[Remote] Content Strategist

[Remote] Director-Delivery Operations - CDH - Remote

[Remote] Engineering Manager - Substation

[Remote] Engineering Manager - Substation

[Remote] Junior Director (Influencer Marketing)

[Remote] Account Manager - Phoenix - AZ

Content Writing Expert (Remote)

AI Operations, GTM

Customer Service Representative

Experienced Data Entry Clerk – Remote Work Opportunity with arenaflex

Sr Charter Plng Spec

Customer Service Rep - Work From Home

Governance Risk & Compliance (GRC) Analyst in Lakewood, CO-80215( Can start remote, but onsite within 3 months during conversion)

Case Manager, Registered Nurse - Remote

Enterprise Account Manager

Client Services Account Associate - NetLine