[Remote] Senior Software Engineer – AI Reliability
Note: The job is a remote job and is open to candidates in USA. SATIGO is seeking a Senior Software Engineer who is passionate about solving complex reliability, scalability, and performance challenges in AI systems. The role focuses on building and operating production systems that ensure AI works reliably at scale, involving collaboration with engineers and researchers to enhance distributed services and platform infrastructure.
Responsibilities
- Owning the reliability and operational health of production AI systems
- Improving performance, scalability and resilience across distributed services
- Troubleshooting complex production issues across application, database and infrastructure layers
- Building monitoring, alerting and observability capabilities
- Partnering with engineering and research teams to productionise AI systems
- Driving engineering best practices around testing, deployment and incident response
- Contributing to architectural decisions that improve long-term scalability
Skills
- 7+ years of software engineering experience
- Strong Python skills
- Experience with Java, Scala or Kotlin
- Proven experience building and operating distributed systems in production
- Strong Kubernetes experience
- Deep understanding of system performance, scalability and reliability
- Experience with relational databases and performance optimisation
- Strong troubleshooting and incident response capabilities
- Experience with monitoring, logging, metrics and tracing
Benefits
- Work on cutting-edge AI technology solving real-world cybersecurity challenges
- High ownership and technical autonomy
- Complex engineering problems at scale
- Remote-first culture
- Competitive compensation and benefits
- Opportunity to influence the reliability and future direction of a rapidly growing AI platform
Company Overview