- AI Kernel Performance Intern 领跑者计划
- [兼职]
- ——
- AI Kernel Performance Intern 领跑者计划
- 10.0-12.0K/月
- |
- 本科及以上
- |
- 招聘 人数不限
专业不限
来源:
boss直聘
2000846399
-
310115
职位已下线
职位详情
领跑者计划是乐鑫为 2027 届海内外学生打造的转正储备实习项目,工作地点设在中国上海。
The Opportunity
You will be the reason our chip is fast. You will write the hand-tuned kernels that power Large Language Models (LLMs) on our custom RISC-V hardware. You will work directly with hardware architects to exploit our proprietary Matrix (RVM) and Vector (RVV) extensions, squeezing every last FLOP out of the silicon.
Key Responsibilities
· Kernel Implementation: Write kernels for GEMM and common epilogues (bias/activation/quant); implement Softmax/RMSNorm; evolve toward attention kernels as the project matures.
· Micro-Optimization: Analyze assembly output. Did the compiler unroll the loop? Did we stall on a memory load? You fix it.
· Tiling & Layout: Calculate the optimal way to chop a large tensor into "tiles" that fit in our L1 cache/TCM.
· Benchmarking: Build the "speedometer" for the chip. Prove your kernel is faster than the baseline.
What We Will Teach You
· Our proprietary RVM (Matrix) and RVV (Vector) intrinsic APIs.
· How to use our cycle-accurate profilers and hardware counters.
· The specific memory hierarchy constraints of our AI SoC.
Must-Have Qualifications
· Strong C/C++ skills, specifically with a math/logic focus.
· Understanding of Computer Architecture basics: Registers, Cache Hierarchy (L1/L2), SIMD (Single Instruction Multiple Data).
· Comfortable reading/writing technical documentation (Instruction Set Architecture specs).
· Minimum 3 months, at least 4 days per week
Nice-to-Have
· Experience with CUDA, OpenMP, or AVX/Neon intrinsics.
· Coursework in Linear Algebra or Numerical Methods.