Back to positions

[Remote] Principal Engineer, Compute Platform

Remote role Full-time Open position

Note: The job is a remote job and is open to candidates in USA. Pinterest is a platform that inspires creativity and innovation, and they are seeking a Principal Engineer to lead the consolidation and modernization of their compute infrastructure. This role involves designing and building a shared compute platform to support large-scale workloads, enhancing operational efficiency, and collaborating with various teams to meet unique customer needs.

Responsibilities

  • Solving the challenges of replacing isolated pools of dedicated compute resources with a very large scale shared compute platform, shifting from machine-based designs to container-based designs
  • Working with leads across various platforms, especially stateful and data platforms, to build the right features and migration paths that work for them
  • Owning and driving up utilization on the shared compute platform by designing and implementing workload stacking, optimizing and bin packing, safe oversubscription, etc
  • Work with multiple customers with unique requirements to make sure the platform will address their needs and is not only a viable but a desirable solution for running their workloads
  • Leading a group of engineers around design topics, execution, trade offs, migration paths, observability, performance, and operability for the platform
  • Evolving the platform towards a multi-cloud abstraction layer to enable running workloads across multiple cloud providers
  • Being a role model for setting a high bar for production quality and engineering excellence in delivering a foundational technology which empowers the entire company
  • Working closely with partners around capacity planning, cost visibility, fungibility of virtual machine instance types, and efficiency
  • Putting special focus on the delivery of GPU resources through the platform, to enable and expedite AI workloads
  • Leverage AI tools to increase the velocity and ease of migrations, and create self service solutions for the customers of the platform as needed
  • Help the team apply AI to the operational aspects of running the cluster, discovering issues, and investigating and root causing issues
  • Expedite feature development using AI coding tools and be a thought leader on creating the right balance between speed and safety by designing safeguards and layers of defense

Skills

  • Bachelor's degree in Computer Science, Engineering, or a related field, or equivalent experience
  • 12+ years of relevant industry experience with large scale, production distributed systems
  • 5+ years of experience with Kubernetes in production
  • Experience working across SWE and SRE or Production Engineering teams to deliver robust production systems
  • Ability to work with cross-functional partners across multiple organizations
  • Passion for automation, reducing toil, and building proper tooling for getting the job done
  • Experience with running distributed data systems and migrating them to Kubernetes is highly preferred

Benefits

  • The position is also eligible for equity.
  • Information regarding the culture at Pinterest and benefits available for this position can be found here.
  • In-Office Requirement Statement: This role will need to be in the office for in-person collaboration 1-2 times/quarter and therefore can be situated anywhere in the country.

Company Overview

  • Pinterest is a visual bookmarking tool for saving and discovering creative ideas. It was founded in 2010, and is headquartered in San Francisco, California, USA, with a workforce of 1001-5000 employees. Its website is https://www.pinterest.com/.
  • Apply To This Job

    Further positions