POSTING ACTIVE · REQ-AFBDC · FY26.Q2

Staff ML Performance Engineer (Inference Optimisation)

Wayve
[ COMPANY ]
[ POSTED ]
[ REQ ID ]
[ COMPENSATION RANGE · ANNUAL · BASE ]
Not Disclosed
TECHNICAL STACK · 3 TAGS
§ 01THE ROLE

As a Staff ML Performance Engineer, you’ll play a key role in high-impact projects, optimising ML inference for edge accelerators and GPUs. The focus of this team is to run large transformer-based models efficiently on low-cost, low-power edge devices to enable Wayve’s first driving product.

You’ll help set the technical direction for turning these models into production systems that run reliably on in-vehicle compute. This is a hands-on role working across ML systems, compilers, runtimes, kernels, and embedded deployment, contributing to several early-stage, high-impact projects at Wayve.

Key responsibilities:

  • Profile and pinpoint bottlenecks across the full inference stack (model graph, compiler/runtime, kernel execution, memory movement) and deliver measurable improvements.

  • Implement and validate optimisations in compilers, runtimes, and/or kernels (e.g. operator fusion, scheduling, quantisation-aware performance, custom kernels).

  • Build robust benchmarking and regression testing to ensure performance improvements hold across models, devices, and software releases.

  • Optimise for multiple targets (e.g. NVIDIA Orin/Thor, Qualcomm) and work with teams to support these in a maintainable way

  • Collaborate with model developers to influence architecture and training/deployment decisions that affect on-device performance.

  • Contribute to technical roadmaps and tooling and help raise the standard of performance engineering across the team

§ 02ABOUT YOU

Essential

  • Proven experience improving performance in production systems with tight constraints (latency, memory, bandwidth, power/thermal, or cost).

  • Strong proficiency with at least one relevant stack/toolchain (e.g. TensorRT, CUDA, Qualcomm QNN, Triton, OpenCL) and confidence learning adjacent frameworks quickly.

  • Comfort operating at multiple levels of abstraction — from high-level model behaviour down to low-level kernel/runtime execution.

  • Strong software engineering fundamentals (debugging, profiling, testing, and maintainable code).

  • Clear communicator and collaborative teammate; able to align multiple stakeholders on performance trade-offs and priorities.

Desirable

  • Exposure to embedded or edge deployment of ML models, including benchmarking on real devices and handling system-level constraints.

  • Experience with NVIDIA and/or Qualcomm SoCs and performance tooling.

  • Python and C++ proficiency.

  • Experience mentoring others and/or driving technical direction in a small, fast-moving team.

This is a full-time role based in our office in London.  At Wayve we want the best of all worlds so we operate a hybrid working policy that combines time together in our offices and workshops to fuel innovation, culture, relationships and learning, and time spent working from home.  

#LI-HH1

[ APPLICATION ROUTE ]ASHBY · External ATS
APPLY VIA ASHBY