Research & Projects科研与项目

Research refers to novel academic inquiries, while Project encompass the implementation or analysis of existing approach. 科研指具有创新性的学术探究，项目指对既有方法的分析或应用。

Research科研

Ongoing: Joint optimization of Neural Retrieval System Ongoing: Joint optimization of Neural Retrieval System

This research aims to develop a unified joint-optimization framework for neural information retrieval, leveraging state-of-the-art open-source foundations (e.g. Qwen2.5-7B and BGE-Reranker-v2). By implementing an end-to-end co-training paradigm across the dense retriever, query expander, and reranker, the system facilitates multi-stage information propagation to transcend the performance ceilings inherent in isolated trained, modular components. 本研究旨在构建一套针对神经信息检索的统一联合优化框架，并充分利用前沿的开源基础模型（例如 Qwen2.5-7B 和 BGE-Reranker-v2）。通过在稠密检索器、查询扩展器和重排序器之间实施端到端的协同训练范式，该系统得以实现多阶段的信息高效传递，从而突破了各模块独立训练时所固有的性能瓶颈。

Neural IREnd-to-end optimizationDense RetrievalQuery ExansionReranking

-- Sub task: Reward-Guided Alignment of LLM-Based Query Expansion -- Sub task: Reward-Guided Alignment of LLM-Based Query Expansion

Innovative Architecture: We designed and implemented an end-to-end EM optimization framework. Through E-Step—which utilizes an LLM to explore diverse expansion paths and employs a Reranker to evaluate rewards—and the M-Step—which performs LoRA fine-tuning based on high-reward samples—this framework enables the generative model to precisely align with the preferences of the retriever. 创新架构：设计并实现了一套端到端的 EM 优化框架，通过 E-Step（利用 LLM 探索多样化扩展路径并由 Reranker 评估奖励）与 M-Step（基于高奖励样本进行 LoRA 微调）的交替迭代，使生成模型精准对齐检索器的偏好

Reward-Guided Learning: The introduction of the Reward Margin mechanism and the DDLP (Delta-Driven Laziness-Penalty) loss function effectively resolves the "negative return" problem in query expansion, guarenteed better performance and Generalization Capability. 基于奖励的学习：引入 Reward Margin 机制与 DDLP (Delta-Driven Laziness-Penalty) 损失函数，有效解决了查询扩展中的“负收益”问题，确保检索性能和泛化能力。

Inspiring Result: Achieved a 4% absolute gain in NDCG@10 on the TREC DL 2019 benchmark by integrating the fine-tuned Query Expander into a state-of-the-art Retrieve-and-Rerank pipeline. 结果提升：通过将经过微调的查询扩展器集成至最先进的“检索-重排”流水线中，在 TREC DL 2019 基准测试上实现了 NDCG@10 4% 的绝对提升。

LLM-based QEEM AlgorithmReward-Guided AlignmentCross-Encoder Reranker

Projects项目

Emotion Classification in Creative Text

[Report] [Code]

Investigated domain adaptation and fine-tuning strategies for the RoBERTa-base model to classify emotions in figurative texts such as poetry and song lyrics. We implemented a two-stage training pipeline that transferred emotional priors from large-scale social media data to the complex, metaphor-rich domain of creative writing. By systematically comparing four parameter-update strategies—including LoRA and full fine-tuning—the study revealed that full fine-tuning yields the best performance (Poem F1≈0.60) for capturing deep semantic relationships and figurative language. 研究了 RoBERTa 基础模型在诗歌和歌词等比喻性文本情感分类中的领域自适应与微调策略。实现了一套两阶段训练流水线，将大规模社交媒体数据的情感先验知识迁移至蕴含丰富隐喻的文学创作领域。通过系统性对比 LoRA 和全量微调等四种参数更新策略，研究发现全量微调在捕捉深层语义关系和比喻性修辞方面表现最优（诗歌数据集 F1≈0.60）。

NLPRoBERTaLoRA

ML-IRL for Learning Dense Reward Functions in Chess

[Report] [Code]

Developed an Inverse Reinforcement Learning (IRL) framework to infer dense, intermediate reward functions from expert chess demonstrations to overcome the temporal credit assignment problem in sparse terminal rewards. We constructed a custom Gymnasium environment and a ResNet-based reward model to map temporally stacked board states to scalar evaluative signals. Through saliency map analysis, the project demonstrated that CNN-based IRL is particularly effective at capturing "visual" strategic patterns and human-like positional intuition compared to pure engine calculation. 开发了一套对抗性逆强化学习（IRL）框架，从专家棋谱中推断稠密的中间奖励函数，以解决稀疏终端奖励带来的信用分配难题。构建了自定义 Gymnasium 环境和基于 ResNet 的奖励模型，将时序堆叠的棋盘状态映射为标量评估信号。通过显著性图（Saliency Maps）分析，该项目证明了相比于纯引擎计算，基于 CNN 的 IRL 在捕捉“视觉化”策略模式和类人位置直觉方面尤为有效。

IRLGymnasiumCNN

Brain-to-Text: Decoding Intracortical Speech

[Report] [Code]

Developed a machine learning pipeline for the Brain-to-Text '25 competition to decode neural activity into natural language for patients with ALS. Implemented a two-stage hybrid Encoder-Decoder framework coupling a BERT-based encoder with a pre-trained GPT-2 decoder. Applied Masked Phoneme Modeling (MPM) and LoRA fine-tuning, achieving a Word Error Rate (WER) of 12.4%, and engineered a Day-Specific Adaptation Layer to handle non-stationary neural signals. 为 Brain-to-Text '25 竞赛开发了机器学习流水线，将 ALS 患者的神经活动解码为自然语言。实现了结合掩码音素建模（MPM）和 LoRA 微调的 BERT-GPT-2 混合框架，达到了 12.4% 的词错误率（WER），并设计了特定于日期的自适应层以处理非平稳神经信号。

Machine LearningBERT/GPT-2LoRA

Cache(d) Compression: Increasing Effective Storage within RISC-V

[Report]

Explored transparent "ZipCache-style" compression on RISC-V architectures (Rocket Chip, BOOM, and VexRiscv) to increase effective DRAM cache capacity without altering the memory hierarchy. Using Chipyard-based simulations, we modeled 4 KB compressed pages with a 2x effective capacity gain and an estimated 15-cycle decompression latency implemented via RoCC-style paths or software NOP stalls. The results indicated that the compressed cache roughly halved miss rates and doubled throughput under heavy capacity pressure by avoiding high-latency SSD-backed misses (500 cycles). This project highlights the architectural trade-offs between decompression overhead and cache hit rates in memory-constrained environments. 在 RISC-V 架构（Rocket Chip, BOOM 和 VexRiscv）上探索了“ZipCache 风格”的透明压缩技术，旨在不改变内存层级结构的前提下有效扩大 DRAM 缓存容量。基于 Chipyard 仿真平台，通过模拟 4 KB 压缩页实现了 2 倍的逻辑容量提升，并利用 RoCC 路径或软件 NOP 指令模拟了约 15 个时钟周期的解压延迟。实验结果表明，在高容量压力下，该压缩缓存方案将缺失率降低了约一半，并将吞吐量提升了近两倍，其核心优势在于通过极小的解压开销避免了高昂的 SSD 访问延迟（约 500 周期）。该项目深入分析了内存受限环境下解压开销与有效容量之间的架构权衡。

RISC-V ArchitectureCache CompressionChipyard