Back to positions

[Remote] Network Engineer - Network Resiliency and High Availability

Remote role Full-time Open position

Note: The job is a remote job and is open to candidates in USA. Dice is seeking a Senior Network Engineer specializing in Network Resiliency and High Availability to ensure their global network infrastructure remains fault-tolerant and capable of seamless disaster recovery. The role involves designing, validating, and optimizing redundant paths and high-availability clusters while ensuring zero packet loss for business-critical applications during unforeseen failures.

Responsibilities

  • Design, implement, and maintain high-availability network topologies using physical and logical redundancy patterns (e.g., Multi-Chassis EtherChannel/MCLAG, VPC, and VSS)
  • Architect redundant Wide Area Network (WAN) transport paths utilizing dual-homed ISP connections, SD-WAN dynamic path selection, and automated failover technologies
  • Conduct controlled Network Chaos Engineering exercises (e.g., simulating fiber cuts, device power failures, and split-brain scenarios) to validate failover timers and resilience assumptions
  • Optimize enterprise routing protocols (BGP, OSPF, EIGRP) for ultra-fast convergence, tuning features like Bidirectional Forwarding Detection (BFD), Fast Reroute (FRR), and Graceful Restart
  • Implement First Hop Redundancy Protocols (HSRP, VRRP, GLBP) to guarantee default gateway redundancy for end-user and server segments
  • Manage complex traffic engineering strategies (e.g., BGP local preference, AS-path prepending) to ensure predictable asymmetric/symmetric routing during failure states
  • Lead the network engineering track for Corporate Disaster Recovery planning, including active-active and active-passive data center strategies
  • Design, configure, and maintain automated DNS-based failover (GSLB) and Anycast routing strategies to reroute user traffic away from degraded data centers or cloud regions
  • Keep comprehensive, up-to-date documentation on failover runbooks and infrastructure dependency maps
  • Deploy advanced monitoring tools to track metrics like Mean Time to Detect (MTTD) and Mean Time to Repair (MTTR)
  • Set up telemetry-based alerting (SNMP, gRPC/Streaming Telemetry) to identify gray failures (e.g., high interface error rates causing intermittent drops) before they cause total outages

Skills

  • 5+ years in a dedicated network engineering or operations role, with a proven track record of designing 99.99% or 99.999% (Four-to-Five Nines) uptime environments
  • Bachelor's degree in Computer Science, Computer Engineering, or equivalent practical experience
  • Design, implement, and maintain high-availability network topologies using physical and logical redundancy patterns (e.g., Multi-Chassis EtherChannel/MCLAG, VPC, and VSS)
  • Architect redundant Wide Area Network (WAN) transport paths utilizing dual-homed ISP connections, SD-WAN dynamic path selection, and automated failover technologies
  • Conduct controlled Network Chaos Engineering exercises (e.g., simulating fiber cuts, device power failures, and split-brain scenarios) to validate failover timers and resilience assumptions
  • Optimize enterprise routing protocols (BGP, OSPF, EIGRP) for ultra-fast convergence, tuning features like Bidirectional Forwarding Detection (BFD), Fast Reroute (FRR), and Graceful Restart
  • Implement First Hop Redundancy Protocols (HSRP, VRRP, GLBP) to guarantee default gateway redundancy for end-user and server segments
  • Manage complex traffic engineering strategies (e.g., BGP local preference, AS-path prepending) to ensure predictable asymmetric/symmetric routing during failure states
  • Lead the network engineering track for Corporate Disaster Recovery planning, including active-active and active-passive data center strategies
  • Design, configure, and maintain automated DNS-based failover (GSLB) and Anycast routing strategies to reroute user traffic away from degraded data centers or cloud regions
  • Keep comprehensive, up-to-date documentation on failover runbooks and infrastructure dependency maps
  • Deploy advanced monitoring tools to track metrics like Mean Time to Detect (MTTD) and Mean Time to Repair (MTTR)
  • Set up telemetry-based alerting (SNMP, gRPC/Streaming Telemetry) to identify gray failures (e.g., high interface error rates causing intermittent drops) before they cause total outages
  • Cisco Certified Internetwork Expert (CCIE - Enterprise Infrastructure or Data Center) or strong CCNP with equivalent experience
  • Juniper Networks Certified Internetworking Specialist/Expert (JNCIS/JNCIE)
  • Certified Business Continuity Professional (CBCP) or equivalent familiarity with DR frameworks is a plus

Company Overview

  • Dice is the go-to career marketplace for tech professionals. It was founded in 2010, and is headquartered in Drachten, Friesland, NLD, with a workforce of 201-500 employees. Its website is https://www.or-quest.nl/.
  • Company H1B Sponsorship

  • Dice has a track record of offering H1B sponsorships, with 2 in 2022, 4 in 2021, 5 in 2020. Please note that this does not guarantee sponsorship for this specific role.
  • Apply To This Job

    Further positions