Back to Jobs
Vercept

Backend Engineer Inference Optimization at Vercept

Vercept Seattle, WA

Job Description

About UsWe're a high-energy, impact-driven team, with a long track record of academic excellence. Our team includes researchers whose work has shaped the field—earning best paper awards at top AI conferences and even ranking among the most cited scientists in history of science. We've built fundamental, transformative research that has redefined the community, and now, we're here to change the world—one breakthrough at a time.What We're Looking For & Why Join UsWe’re looking for a Backend Engineer – Inference Optimization who thrives on solving some of the hardest systems problems in AI. You’ll focus on pushing the limits of foundation model inference performance, working at the intersection of cutting-edge ML and high-performance systems engineering. This is your opportunity to set new benchmarks for latency, throughput, and efficiency at scale.What is this role?As a Backend Engineer, you’ll own the design and optimization of inference pipelines for large-scale models. You’ll work closely with researchers and infrastructure engineers to identify bottlenecks, implement advanced techniques like quantization and KV caching, and deploy high-performance serving systems in production. Your work will directly determine how fast and cost-effectively users can access next-generation AI.What do we expect?Must have:Deep experience in optimizing model inference pipelines, model quantization and KV caching.Proficiency in backend systems and high-performance programming (Python, C++, or Rust)Familiarity with distributed serving, GPU acceleration, and large-scale systemsAbility to debug complex performance issues across model, runtime, and hardware layersComfort working in fast-moving environments with ambitious technical goalsNice to have:Hands-on experience with vLLM or similar inference frameworksBackground in GPU kernel optimization (CUDA, Triton, ROCm)Experience scaling inference across multi-node or heterogeneous clustersPrior work in model compilation (e.g., TensorRT, TVM, ONNX Runtime)Hands-on experience with model quantizationCompensation & Benefits$150K – $250K + EquityWe offer health benefits, a 401(k) plan, and meaningful equity—because we believe top talent should be supported, secure, and fully invested in the future we’re building together.Location: Our company is in-office at our Seattle HQ.

Resume Suggestions

Highlight relevant experience and skills that match the job requirements to demonstrate your qualifications.

Quantify your achievements with specific metrics and results whenever possible to show impact.

Emphasize your proficiency in relevant technologies and tools mentioned in the job description.

Showcase your communication and collaboration skills through examples of successful projects and teamwork.

Explore More Opportunities