Back to positions

[Remote] Senior Systems Engineer, Storage - DGX Cloud

Remote role Full-time Open position

Note: The job is a remote job and is open to candidates in USA. NVIDIA is a leading technology company known for its innovative GPU cloud services. The Senior Systems Engineer will design, deploy, and operate solutions on Kubernetes for large-scale storage and data platforms, ensuring reliability and performance through automation and observability.

Responsibilities

  • Design, deploy, and operate solutions on Kubernetes for large-scale storage and data platforms, including the manifests, Helm charts, and operators that run them
  • Build tools, services, and automation that improve the lifecycle of storage and data systems – from provisioning and configuration through deployment, scaling, and day-2 operations
  • Develop and operate telemetry and observability for production systems – metrics, logging, tracing, dashboards, and alerting – so that system health, availability, and latency are measurable and actionable
  • Apply strong analytical troubleshooting skills to diagnose and resolve complex issues across distributed, containerized infrastructure
  • Work closely with peers and partner teams to improve the lifecycle of services, from inception and design through deployment, operation, and refinement
  • Scale systems sustainably through automation, infrastructure-as-code, and CI/CD, and evolve systems by pushing for changes that improve reliability and velocity
  • Support services before they go live through activities such as deployment automation, capacity planning, and launch and readiness reviews
  • Practice sustainable incident response and postmortems, and participate in an on-call rotation to support production systems

Skills

  • BS degree (or equivalent experience) in Computer Science or related technical field involving coding
  • 12+ years of practical experience
  • Hands-on experience with Kubernetes – deploying, configuring, and operating workloads and solutions on Kubernetes in production
  • Experience building tools and services for storage, data, or platform infrastructure, with solid software design fundamentals (algorithms, data structures, complexity analysis) on large-scale Linux-based systems
  • Experience building and operating telemetry and observability using tools such as Prometheus, InfluxDB, Grafana, and the Elastic stack
  • Strong analytical troubleshooting skills with a systematic, root-cause-driven approach to identifying and resolving complex problems
  • Proficiency in one or more of the following: Python, Go, or Java
  • Good knowledge of infrastructure configuration management and infrastructure-as-code tools such as Ansible, Chef, Puppet, ArgoCD, Git Pipelines, and Terraform
  • Customer-first mindset with a focus on customer satisfaction and a passion for ensuring customer success
  • Experience with Git, code review, pipelines, and CI/CD
  • Experience using or running large private and public cloud systems based on Kubernetes, OpenStack, and Docker
  • Interest in crafting, analyzing, and fixing large-scale distributed systems, with strong debugging skills and a systematic problem-solving approach
  • Experience designing storage- or data-focused tooling and automating their operations at scale
  • Thrive in collaborative environments and enjoy working with various teams, and are flexible in adapting to different working styles

Benefits

  • You will also be eligible for equity and [benefits](https://www.nvidia.com/en-us/benefits/).

Company Overview

  • NVIDIA is a computing platform company operating at the intersection of graphics, HPC, and AI. It was founded in 1993, and is headquartered in Santa Clara, California, USA, with a workforce of 10001+ employees. Its website is https://www.nvidia.com.
  • Company H1B Sponsorship

  • NVIDIA has a track record of offering H1B sponsorships, with 448 in 2026, 1872 in 2025, 1354 in 2024, 976 in 2023, 835 in 2022, 601 in 2021, 529 in 2020. Please note that this does not guarantee sponsorship for this specific role.
  • Apply To This Job

    Further positions

    [Remote] Principal Product Manager, Healthcare Payer Strategy

    Remote role Full-time

    [Remote] Staff Data Scientist

    Remote role Full-time

    [Remote] Management Consulting Senior Associate (49530)

    Remote role Full-time

    [Remote] SEO Strategist

    Remote role Full-time

    [Remote] Senior Account Manager

    Remote role Full-time

    [Remote] Software Engineer III - Content Tooling (AI Focus)

    Remote role Full-time

    [Remote] Operations Coordinator, Patient Care Services

    Remote role Full-time

    [Remote] Salesforce Administrator I

    Remote role Full-time

    [Remote] Senior GTM Operations Engineer

    Remote role Full-time

    [Remote] Executive Director, PGS Operations

    Remote role Full-time

    HEALTH PROGRAM Technical Assistance Lead (New Mexico REMOTE)

    Remote role Full-time

    Experienced Work-at-Home Customer Service Representative – Full-Time & Part-Time Opportunities at arenaflex

    Remote role Full-time

    Experienced Remote Data Entry Research Panelist – Flexible Work Schedule and Competitive Compensation

    Remote role Full-time

    Experienced Customer Service Representative – Healthcare Member and Provider Support

    Remote role Full-time

    Experienced Junior Administrative Assistant/Data Entry Professional – Remote Opportunity with arenaflex

    Remote role Full-time

    SAP Data Migration Engineer (LTMC / SLT / LVM)

    Remote role Full-time

    Housing Mobile Team Peer Specialist

    Remote role Full-time

    Medical Translators

    Remote role Full-time

    Experienced Work From Home Customer Service Representative – Healthcare Industry Expertise

    Remote role Full-time

    Experienced Remote Customer Support Specialist – Delivering Exceptional Customer Experiences with arenaflex

    Remote role Full-time