POSTING ACTIVE · REQ-AFBDC · FY26.Q2

Staff ML Performance Engineer (Inference Optimisation)

Wayve

[ COMPANY ]

Wayve

[ LOCATION ]

London, United Kingdom

[ POSTED ]

2026.05.14

[ REQ ID ]

REQ-AFBDC

[ COMPENSATION RANGE · ANNUAL · BASE ]

Not Disclosed

TECHNICAL STACK · 3 TAGS

[CUDA][TENSORRT][PYTHON]

§ 01THE ROLE

As a Staff ML Performance Engineer, you’ll play a key role in high-impact projects, optimising ML inference for edge accelerators and GPUs. The focus of this team is to run large transformer-based models efficiently on low-cost, low-power edge devices to enable Wayve’s first driving product.

You’ll help set the technical direction for turning these models into production systems that run reliably on in-vehicle compute. This is a hands-on role working across ML systems, compilers, runtimes, kernels, and embedded deployment, contributing to several early-stage, high-impact projects at Wayve.

Key responsibilities:

Profile and pinpoint bottlenecks across the full inference stack (model graph, compiler/runtime, kernel execution, memory movement) and deliver measurable improvements.
Implement and validate optimisations in compilers, runtimes, and/or kernels (e.g. operator fusion, scheduling, quantisation-aware performance, custom kernels).
Build robust benchmarking and regression testing to ensure performance improvements hold across models, devices, and software releases.
Optimise for multiple targets (e.g. NVIDIA Orin/Thor, Qualcomm) and work with teams to support these in a maintainable way
Collaborate with model developers to influence architecture and training/deployment decisions that affect on-device performance.
Contribute to technical roadmaps and tooling and help raise the standard of performance engineering across the team

§ 02ABOUT YOU

Essential

Proven experience improving performance in production systems with tight constraints (latency, memory, bandwidth, power/thermal, or cost).
Strong proficiency with at least one relevant stack/toolchain (e.g. TensorRT, CUDA, Qualcomm QNN, Triton, OpenCL) and confidence learning adjacent frameworks quickly.
Comfort operating at multiple levels of abstraction — from high-level model behaviour down to low-level kernel/runtime execution.
Strong software engineering fundamentals (debugging, profiling, testing, and maintainable code).
Clear communicator and collaborative teammate; able to align multiple stakeholders on performance trade-offs and priorities.

Desirable

Exposure to embedded or edge deployment of ML models, including benchmarking on real devices and handling system-level constraints.
Experience with NVIDIA and/or Qualcomm SoCs and performance tooling.
Python and C++ proficiency.
Experience mentoring others and/or driving technical direction in a small, fast-moving team.

This is a full-time role based in our office in London. At Wayve we want the best of all worlds so we operate a hybrid working policy that combines time together in our offices and workshops to fuel innovation, culture, relationships and learning, and time spent working from home.

#LI-HH1

[ APPLICATION ROUTE ]ASHBY · External ATS

APPLY VIA ASHBY→

Founded	2017
HQ	London, UK
Total Funding	$1.3B
Last Round	Series C · $1.1B
Round Date	May 2024
Open Roles	118
Status	Hiring