Back to positions

[Remote] Senior Cloud Engineer, Observability

Remote role Full-time Open position

Note: The job is a remote job and is open to candidates in USA. Bayer is a company committed to solving the world’s toughest challenges in health and agriculture. They are seeking a Senior Cloud Engineer specializing in Observability to enhance their digital farming technology by improving observability practices, collaborating with teams, and driving reliability outcomes within their AWS platform.

Responsibilities

  • Be the hands-on SME for our observability toolchain (e.g., Datadog, CloudWatch, OpenSearch), including log pipelines, tracing/telemetry standards, and platform templates
  • Run office hours, produce exemplars, and pair with teams to implement “known-good” instrumentation and alerting
  • Triage and resolve observability-related platform requests (new service onboarding, log/metric gaps, noisy alerts, dashboard standards) with clear ownership and measurable outcomes
  • Establish and operationalize SLIs/SLOs for key platform components and enable teams to define service SLOs without reinventing the wheel
  • Maintain opinionated “golden paths” for:
  • Logging (standard fields/tags, retention, routing, searchability)
  • Metrics (naming conventions, cardinality guardrails, standard RED/USE views)
  • Tracing (service maps, critical spans, propagation standards)
  • Dashboards (starter dashboards by service type + curated views for platform reliability)
  • Provide reusable templates for alerting patterns (latency, error-rate, saturation, dependency failures), tuned for actionable paging vs. noise
  • Reduce MTTR by improving detection, triage paths, runbooks, and “what changed” visibility
  • Drive reliability reviews focused on observability gaps: missing signals, unclear ownership, bad alerts, and uninstrumented failure modes
  • Partner with delivery teams to turn recurring incidents into durable fixes (instrumentation + alerting + automation + documentation)
  • Embed observability checks into CI/CD and platform workflows (e.g., telemetry guardrails, dashboard/monitor templates, logging standards checks)
  • Partner with Security/Compliance to ensure telemetry supports auditability and incident investigation without ad-hoc effort
  • Define and report platform observability KPIs: alert noise rate, % actionable alerts, MTTA/MTTR trends, onboarding time to “fully observable,” runbook coverage, incident recurrence
  • Run lightweight experiments to improve signal quality (threshold tuning, monitor redesign, dashboard UX), and ship improvements like a product owner
  • Create cost-aware telemetry standards (log volume controls, metric cardinality guidance, sampling strategies, retention tiers)
  • Help teams optimize spend while improving reliability outcomes (“cheaper + better” logging/metrics patterns)
  • Serve as a trusted partner to delivery units, Security, and Data—turning pain points into paved-road improvements
  • Mentor engineers and uplift organizational practices for incident response, reliability signals, and operational excellence

Skills

  • Bachelor's in computer science/engineering or equivalent experience
  • 5+ years hands-on AWS experience operating production workloads
  • Deep practical experience with observability in production, including: Datadog and/or CloudWatch (dashboards, monitors/alerts, log search, correlation)
  • Designing actionable alerts (noise reduction, ownership, runbook-first alerts)
  • Defining/using SLIs/SLOs and reliability metrics to drive behavior
  • Strong proficiency with Infrastructure as Code (Terraform; CloudFormation a plus)
  • Strong programming for automation/tooling (Python, Go, or similar)
  • Solid grasp of cloud architecture, networking, and security fundamentals
  • Experience productizing observability enablement (templates, golden paths, standards, onboarding workflows)
  • CI/CD at scale (GitLab pipelines), including integrating reliability/telemetry guardrails into delivery workflows
  • Logging/telemetry platforms beyond CloudWatch/Datadog (e.g., ELK/OpenSearch) and experience managing scale concerns (volume, retention, cardinality)
  • Container platforms (ECS/EKS) and common AWS data services (RDS/Aurora, S3/lake patterns, MSK/Kinesis)
  • FinOps experience related to observability (tagging, allocation, optimizing telemetry cost)
  • Relevant AWS certifications and excellent communication skills

Benefits

  • Additional compensation may include a bonus or incentive program (if relevant).
  • Health care
  • Vision
  • Dental
  • Retirement
  • PTO
  • Sick leave

Company Overview

  • Bayer is a life science company that specializes in the areas of health care and agriculture. It was founded in 1863, and is headquartered in Leverkusen, Nordrhein-Westfalen, DEU, with a workforce of 10001+ employees. Its website is https://www.bayer.com.
  • Apply To This Job

    Further positions

    [Remote] Senior Content Marketing Manager

    Remote role Full-time

    [Remote] Account Executive, Enterprise (NE)

    Remote role Full-time

    [Remote] Director, Educational Strategy

    Remote role Full-time

    [Remote] Implementation Consultant, Enterprise Intelligence

    Remote role Full-time

    [Remote] Account Manager, Personal and Home Care (West Coast)

    Remote role Full-time

    [Remote] Manager, Data Architecture & Engineering

    Remote role Full-time

    [Remote] Mechanical Design Engineer #26-13532

    Remote role Full-time

    [Remote] Product Manager

    Remote role Full-time

    [Remote] Sr. Clinical Research Associate (Cardiac Catheter Products)

    Remote role Full-time

    [Remote] Southeast - Retail Account Executive

    Remote role Full-time

    Senior Frontend Engineer

    Remote role Full-time

    Licensed Sales Professional (LSP) - UT

    Remote role Full-time

    Remote Vacation Specialist

    Remote role Full-time

    Experienced Medical Transcription Specialist – Remote Chat Support Agent in Medical Transcription, Earning $25-$35/Hour

    Remote role Full-time

    Experienced Part Time Remote Customer Service Representative – Flexible Hours & Opportunity to Make a Difference

    Remote role Full-time

    Coding Validation Coder I

    Remote role Full-time

    Sourcing Recruiter

    Remote role Full-time

    Administrative Assistant

    Remote role Full-time

    Part-Time Data Entry Specialist – Flexible Remote Work Opportunities at arenaflex

    Remote role Full-time

    Experienced OnlyFans Live Chat Assistant – Remote Opportunity for Entry-Level Professionals

    Remote role Full-time