Link copied to clipboard!
Back to Jobs
Member of Technical Staff, ML Infra, AGI at Amazon
Amazon
San Francisco, CA
Information Technology
Posted 0 days ago
Job Description
Are you interested in a unique opportunity to advance the accuracy and efficiency of Artificial General Intelligence (AGI) systems If so youre at the right place! We are the AGI Autonomy organization and we are looking for a driven and talented Member of Technical Staff to join us to build state-of-the art agents.Our lab is a small talent-dense team with the resources and scale of Amazon. Each team in the lab has the autonomy to move fast and the long-term commitment to pursue high-risk high-payoff research. Were entering an exciting new era where agents can redefine what AI makes possible. Wed love for you to join our lab and build it from the ground up!Key job responsibilities* Design build and maintain the compute platform that powers all AI research at the SF AI Lab managing large-scale GPU pools and ensuring optimal resource utilization* Partner directly with research scientists to understand experimental requirements and develop infrastructure solutions that accelerate research velocity* Implement and maintain robust security controls and hardening measures while enabling researcher productivity and flexibility* Modernize and scale existing infrastructure by converting manual deployments into reproducible Infrastructure as Code using AWS CDK* Optimize system performance across multiple GPU architectures becoming an expert in extracting maximum computational efficiency* Design and implement monitoring orchestration and automation solutions for GPU workloads at scale* Ensure infrastructure is compliant with Amazon security standards while creatively solving for research-specific requirements* Collaborate with AWS teams to leverage and influence cloud services that support AI workloads* Build distributed systems infrastructure including Kubernetes-based orchestration to support multi-tenant research environments* Serve as the bridge between traditional systems engineering and ML infrastructure bringing enterprise-grade reliability to research computingAbout the teamThis role is part of the foundational infrastructure team at the SF AI Lab responsible for the platform that enables all research across the organization. Our team serves as the critical link between Amazons enterprise infrastructure and the Labs research needs. We are experts in performance optimization systems architecture and creative problem-solvingfinding ways to push the boundaries of whats possible while maintaining security and reliability standards.We work closely with research scientists understanding their experimental needs and translating them into robust scalable infrastructure solutions. Our team has deep expertise in ML framework internals and GPU optimization but were also pragmatic systems engineers who build traditional infrastructure with enterprise-grade quality. We value engineers who can balance research velocity with operational excellence who bring curiosity about ML while maintaining strong fundamentals in systems engineering.This is a small high-impact team where your work directly enables breakthrough AI research. Youll have the opportunity to work with some of the most advanced AI infrastructure in the world while building the skills that define the future of ML systems engineering.- * 5 years of professional experience in systems development DevOps or infrastructure engineering- * Hands-on experience with AWS services and cloud infrastructure (EC2 VPC S3 IAM CloudFormation/CDK etc.)- * Programming skills in Python Go or similar languages for infrastructure automation- * Experience building and maintaining production systems at scale- * Demonstrated ability to troubleshoot complex distributed systems issues- * Knowledge of security best practices and experience implementing security controls- * Experience with Infrastructure as Code (IaC) principles and tools- * Knowledge of AWS CDK and CloudFormation for infrastructure automation- * Networking experience (VPC design network security performance optimization)- * Security hardening experience in cloud environments including compliance frameworks- * Experience with Kubernetes and container orchestration at scale- * Familiarity with GPU computing CUDA and ML framework internals (PyTorch TensorFlow Ray)Amazon is an equal opportunity employer and does not discriminate on the basis of protected veteran status disability or other legally protected status.Los Angeles County applicants: Job duties for this position include: work safely and cooperatively with other employees supervisors and staff; adhere to standards of excellence despite stressful conditions; communicate effectively and respectfully with employees supervisors and staff to ensure exceptional customer service; and follow all federal state and local laws and Company policies. Criminal history may have a direct adverse and negative relationship with some of the material job duties of this position. These include the duties and responsibilities listed above as well as the abilities to adhere to company policies exercise sound judgment effectively manage stress and work safely and respectfully with others exhibit trustworthiness and professionalism and safeguard business operations and the Companys reputation. Pursuant to the Los Angeles County Fair Chance Ordinance we will consider for employment qualified applicants with arrest and conviction records.Pursuant to the San Francisco Fair Chance Ordinance we will consider for employment qualified applicants with arrest and conviction records.Our inclusive culture empowers Amazonians to deliver the best results for our customers. If you have a disability and need a workplace accommodation or adjustment during the application and hiring process including support for the interview or onboarding process please visit for more information. If the country/region youre applying in isnt listed please contact your Recruiting Partner.Our compensation reflects the cost of labor across several US geographic markets. The base pay for this position ranges from $150000/year in our lowest geographic market up to $325000/year in our highest geographic market. Pay is based on a number of factors including market location and may vary depending on job-related knowledge skills and experience. Amazon is a total compensation company. Dependent on the position offered equity sign-on payments and other forms of compensation may be provided as part of a total compensation package in addition to a full range of medical financial and/or other benefits. For more information please visit This position will remain posted until filled. Applicants should apply via our internal or external career site. Key Skills ICT,ASP.NET,Gas,Field Employment Type : Full-Time Experience: years Vacancy: 1 Yearly Salary Salary: 150000 - 325000
Resume Suggestions
Highlight relevant experience and skills that match the job requirements to demonstrate your qualifications.
Quantify your achievements with specific metrics and results whenever possible to show impact.
Emphasize your proficiency in relevant technologies and tools mentioned in the job description.
Showcase your communication and collaboration skills through examples of successful projects and teamwork.