Arm
Hybrid (Cambridge, UK)
High-performance ML workloads on Arm CPUs requires the co-development of algorithms and highly optimized CPU kernels. In CT-ML (Central Technology, Machine Learning), rapid kernel prototyping is crucial for exploring algorithms and assessing trade-offs between model accuracy and performance. Successful prototypes are essential to drive future CPU architecture development and also deliverables to Central Engineering for final production.
Responsibilities:
This position is part of a dedicated team within the CT-ML group to focus on analyzing ML workload, rapid prototyping of highly optimized CPU kernels to drive model performance and accuracies.
Required Skills and Experience :
Strong interest and passion for implementing high-performance kernel code in a dynamic environment.
4+ years experience in implementing high performance CPU kernel with vector and matrix extensions.
Experience measuring and understanding performance
Experience in creating an...