Lynx Li
  • Home
  • Archives
  • Categories
  • Tags
  • About

AI Research--A Year of Ramblings​

The Beginning of Everything Today marks exactly one year since I became an amateur AI researcher. I still remember how disheartened I was with my circumstances back then. I was originally supposed to
2025-05-12
Life Moments
#Emotional #AI researcher #Independent Researcher

Why model.enable_input_require_grads()?

What happens when using LoRA? It starts with the error RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn when you have part of the parameter
2025-05-12
LLM > Troubleshooting
#AI #Deep Learning

Differentiable Permutation Layer

Introduction Early in 2024, when I was still working on computer vision (CV), Mamba had just been introduced, and I saw many attempts to apply it to images. However, models like Mamba or RNN/SSM inhe
2025-05-08
Research Blogs
#Sequence models #Mamba #RNN

Rethinking R1-like Rule-based RL

Introduction
2025-05-08
Research Blogs
#LLM #Reasoning

01-EM Models

1 Simplifying Maxwell Equations Macroscopic Maxwell equations describes any EM waves, we thus need to start from these equations to build our model in a mixed dielectric media like photonic crystals.
2025-04-30
Optics > Nanophotonics
#Optics #Photonics

RLHF -- GRPO

RLHF – GRPO ongoing
2025-02-16
LLM > RLHF
#智能系统 #深度学习 #AIGC

RLHF -- DPO

RLHF – DPO ongoing
2025-02-16
LLM > RLHF
#智能系统 #深度学习 #AIGC

RLHF -- From Zero to PPO 代码篇

RLHF: From Zero to PPO 代码篇 1 简单的强化学习示例 ongoing 2 从OpenRLHF中看PPO实现 ongoing
2025-02-16
LLM > RLHF
#智能系统 #深度学习 #AIGC

RLHF -- From Zero to PPO 理论篇

RLHF: From Zero to PPO 理论篇 1 强化学习101 1.1 建立基本框架 假设我们有一个个体(agent),其处在某个环境中,个体在这个环境里一定会存在一个状态(state)(空间中的位置,时间中的某一刻),个体会采取某个行动(action)(例如空间中移动)导致状态更新。个体行动的方式被policy建模。 policy的作用是使用概率建模个体在某个状态下采取某个行动的
2025-02-13
LLM > RLHF
#智能系统 #深度学习 #AIGC

最初的sin/cos编码

位置编码–最初的sin/cos编码 1 1D 序列的sin/cos编码 1.1 介绍 众所周知,Transformers模型本身没有关于位置的inductive bias,所以需要额外注入位置信息。在最初的《Attention is All You Need》[1]文章中,作者提出了首个流传至今的位置编码方式: sin/cos位置编码。 假设模型的输入embedding为x∈RB×T×dx\
2025-02-06
LLM > Position Encoding
#智能系统 #深度学习 #AIGC
123…6

Search

Hexo Fluid
总访问量 次 总访客数 人