[Remote] Principal Engineer, Compute Platform

Remote role Full-time Open position

Note: The job is a remote job and is open to candidates in USA. Pinterest is a platform that inspires creativity and innovation, and they are seeking a Principal Engineer to lead the consolidation and modernization of their compute infrastructure. This role involves designing and building a shared compute platform to support large-scale workloads, enhancing operational efficiency, and collaborating with various teams to meet unique customer needs.

Responsibilities

Solving the challenges of replacing isolated pools of dedicated compute resources with a very large scale shared compute platform, shifting from machine-based designs to container-based designs
Working with leads across various platforms, especially stateful and data platforms, to build the right features and migration paths that work for them
Owning and driving up utilization on the shared compute platform by designing and implementing workload stacking, optimizing and bin packing, safe oversubscription, etc
Work with multiple customers with unique requirements to make sure the platform will address their needs and is not only a viable but a desirable solution for running their workloads
Leading a group of engineers around design topics, execution, trade offs, migration paths, observability, performance, and operability for the platform
Evolving the platform towards a multi-cloud abstraction layer to enable running workloads across multiple cloud providers
Being a role model for setting a high bar for production quality and engineering excellence in delivering a foundational technology which empowers the entire company
Working closely with partners around capacity planning, cost visibility, fungibility of virtual machine instance types, and efficiency
Putting special focus on the delivery of GPU resources through the platform, to enable and expedite AI workloads
Leverage AI tools to increase the velocity and ease of migrations, and create self service solutions for the customers of the platform as needed
Help the team apply AI to the operational aspects of running the cluster, discovering issues, and investigating and root causing issues
Expedite feature development using AI coding tools and be a thought leader on creating the right balance between speed and safety by designing safeguards and layers of defense

Skills

Bachelor's degree in Computer Science, Engineering, or a related field, or equivalent experience
12+ years of relevant industry experience with large scale, production distributed systems
5+ years of experience with Kubernetes in production
Experience working across SWE and SRE or Production Engineering teams to deliver robust production systems
Ability to work with cross-functional partners across multiple organizations
Passion for automation, reducing toil, and building proper tooling for getting the job done
Experience with running distributed data systems and migrating them to Kubernetes is highly preferred

Benefits

The position is also eligible for equity.
Information regarding the culture at Pinterest and benefits available for this position can be found here.
In-Office Requirement Statement: This role will need to be in the office for in-person collaboration 1-2 times/quarter and therefore can be situated anywhere in the country.

Company Overview

Pinterest is a visual bookmarking tool for saving and discovering creative ideas. It was founded in 2010, and is headquartered in San Francisco, California, USA, with a workforce of 1001-5000 employees. Its website is https://www.pinterest.com/.

Apply To This Job

Apply

[Remote] Principal Engineer, Compute Platform

Further positions

[Remote] Modern Workplace Collaboration Consultant

[Remote] Product Management Leader - M365

[Remote] Digital Marketing Specialist

[Remote] Product Manager

[Remote] AI Engineering - Director/Senior Director

[Remote] Software Development Engineer II, Traffic

[Remote] Survey Project Manager

[Remote] Senior Clinical Project Manager - Ophthalmology & Oncology

[Remote] Product Manager

[Remote] Principal Site Reliability Engineer

Experienced Digital Chat Moderator – Remote Work Opportunity with arenaflex

Virtual Tutor

NORY Permits Coordinator

[Remote] Full Stack Developer Contractor: 6-9 years (Advanced)

Medical Coder, Amazon One Medical Senior Health

Experienced Live Chat Support Specialist – OnlyFans Chatter

Experienced Entry-Level Online Chat Agent – Customer Support Representative at arenaflex

Part-Time Integrated Media Execution Consultant

Senior Frontend Developer (React / TypeScript / AWS)

Experienced Full Stack Data Entry Clerk – Remote Work Opportunity with arenaflex