[Remote] Principal Software Engineer, AI Networking
Note: The job is a remote job and is open to candidates in USA. NVIDIA has been transforming computer graphics, PC gaming, and accelerated computing for more than 25 years. As a Principal Software Engineer, you will lead the transformation of AI networking systems and manage complex customer engagements to influence NVIDIA's networking technologies.
Responsibilities
- Lead the technical strategy for AI Factory networking deployments at strategic customers, including conducting architecture reviews, risk assessments, and crafting multi-phase execution plans
- Serve as the principal-level technical authority for embedded networking products like BlueField and ConnectX. This role also covers the surrounding technology ecosystem, including DOCA, RDMA, RoCE, and Infiniband
- Lead deep technical engagements with hyperscalers and AI Factory customers, involving design-in, coding, bring-up, performance tuning, failure analysis, and production hardening
- Partner with internal engineering, product, and architecture teams to transform customer needs into product features, reference architectures, tooling, and guidelines
- Drive performance, reliability, and debuggability improvements across customer stacks and translate findings into actionable product, firmware, and software roadmap items
Skills
- BS/MS/PhD in Computer Science, Computer Engineering, Electrical Engineering, or equivalent experience
- 15+ years of relevant industry experience, including technical leadership across complex systems
- Deep knowledge of networking protocols and distributed systems, with a strong understanding of RoCE/InfiniBand, L1–L4 fundamentals, and performance/latency tradeoffs
- Proven low-level software expertise with proficiency in C/C++ and comfort debugging across firmware, driver, and user space
- Demonstrated experience in high-performance networking and system-level debugging, including packet drops, retransmissions, congestion, QoS, ordering, and buffer management
- Excellent interpersonal skills, with the ability to clearly explain complex topics to engineers, PMs, and customer collaborators, and align cross-organizational teams toward a decision
- Prior experience in customer-facing technical leadership at hyperscalers/CSPs/AI factories (or similarly complex production environments)
- Hands-on expertise with DPDK, DOCA, RDMA verbs, NCCL, CUDA-aware networking, congestion control, and performance tuning at scale
- Experience building internal tools, telemetry, and automation that improve triage speed and operational excellence
- Demonstrated innovation: patents, publications, hackathons, rapid prototyping, or shipping new architecture/features end-to-end
- Experience leading multi-team initiatives across geo/time zones, with clear examples of influence without authority as well as eager and proactive in bringing to bear AI-powered tools to accelerate debugging, documentation, and day-to-day engineering efficiency while maintaining strong engineering judgment
Benefits
- Equity
- Benefits
Company Overview
Company H1B Sponsorship