Posts

Language Models

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning January 22, 2025
DeepSeek-V3 Technical Report December 27, 2024
ModernBERT - Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference December 18, 2024
Tulu 3: Pushing Frontiers in Open Language Model Post-Training November 22, 2024
Gemma 2: Improving Open Language Models at a Practical Size July 31, 2024
The Llama 3 Herd of Models July 31, 2024
Apple Intelligence Foundation Language Models July 29, 2024
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model May 7, 2024
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models February 5, 2024
DeepSeek-Coder: When the Large Language Model Meets Programming - The Rise of Code Intelligence January 25, 2024
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models January 11, 2024
Mixtral of Experts January 8, 2024
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism January 5, 2024
Mistral 7B October 10, 2023
Llama 2: Open Foundation and Fine-Tuned Chat Models July 18, 2023
LLaMA: Open and Efficient Foundation Language Models February 27, 2023

Multimodal Learning

Gemma 3 Technical Report March 25, 2025
Pixtral 12B October 9, 2024
Gemini: A Family of Highly Capable Multimodal Models December 19, 2023

Retrieval Augmented Generation

From Local to Global: A Graph RAG Approach to Query-Focused Summarization April 24, 2024
RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval January 31, 2024