Tags
Agentic Intelligence Framework
- Kimi K2: Open Agentic Intelligence July 28, 2025
Alternating Local Global Attention
- Gemma 3 Technical Report March 25, 2025
Asynchronous Reinforcement Learning Infrastructure
- Tulu 3: Pushing Frontiers in Open Language Model Post-Training November 22, 2024
Auxiliary Loss Free Load Balancing
- DeepSeek-V3 Technical Report December 27, 2024
Auxiliary Loss for Load Balance
Code Training Benefits Mathematical Reasoning
Collapsed Tree Retrieval
CommonCrawl Quality Filtering
- LLaMA: Open and Efficient Foundation Language Models February 27, 2023
Community Based Summarization
Community Detection For RAG
Computation Communication Overlap
- DeepSeek-V3 Technical Report December 27, 2024
- LLaMA: Open and Efficient Foundation Language Models February 27, 2023
Computational Efficiency in Large Language Models
- Kimi K2: Open Agentic Intelligence July 28, 2025
Compute Optimal Scaling
Contextual Flexibility Retrieval
Cross File Code Completion
Cross File Dependency Analysis
Decoupled Rotary Position Embedding
DeepSeekMath Corpus
DeepSeekMoE
DeepSeekMoE Architecture
- DeepSeek-V3 Technical Report December 27, 2024
Dependency Aware Training
Dependency Aware Tree Traversal
Depth Over Width Scaling
Device Limited Routing
Direct Preference Optimization
- Tulu 3: Pushing Frontiers in Open Language Model Post-Training November 22, 2024
- The Llama 3 Herd of Models July 31, 2024
- DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models February 5, 2024
- DeepSeek LLM: Scaling Open-Source Language Models with Longtermism January 5, 2024
Distillation of Reasoning Capability
Dynamic Expert Routing
- Mixtral of Experts January 8, 2024
Economical Training
- DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models January 11, 2024
- LLaMA: Open and Efficient Foundation Language Models February 27, 2023
Efficient Cross Node All to All Communication
- DeepSeek-V3 Technical Report December 27, 2024
Efficient Inference
Efficient Inference with Reduced Active Parameters
- Mixtral of Experts January 8, 2024
Efficient Long Context Attention Mechanism
- Gemma 3 Technical Report March 25, 2025
- The Llama 3 Herd of Models July 31, 2024
- DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model May 7, 2024
Efficient Mixture of Experts Architecture
- Kimi K2: Open Agentic Intelligence July 28, 2025
Efficient Model Training
Efficient Transformer Architecture
- DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models January 11, 2024
- Mistral 7B October 10, 2023
Entity Knowledge Graph Extraction
Expert Parallelism
Expert Selection Locality Analysis
- Mixtral of Experts January 8, 2024
FP8 Mixed Precision Training
- DeepSeek-V3 Technical Report December 27, 2024
Fill in the Middle Code Completion
Fine Grained Expert Segmentation
Gaussian Mixture Model Clustering
Ghost Attention
- Llama 2: Open Foundation and Fine-Tuned Chat Models July 18, 2023
Global Context Understanding
Graph RAG
Group Relative Policy Optimization
Grouped Query Attention
- Gemma 3 Technical Report March 25, 2025
- Gemma 2: Improving Open Language Models at a Practical Size July 31, 2024
- DeepSeek LLM: Scaling Open-Source Language Models with Longtermism January 5, 2024
- Mistral 7B October 10, 2023
- Llama 2: Open Foundation and Fine-Tuned Chat Models July 18, 2023
Hierarchical Summarization
Instruction Fine Tuning with Direct Preference Optimization
- Gemma 2: Improving Open Language Models at a Practical Size July 31, 2024
- Mixtral of Experts January 8, 2024
Interleaving Local Global Attention
Iterative Reinforcement Learning
Knowledge Distillation for Small Language Models
Language Model Scaling Laws
Large Scale Agentic Data Synthesis
- Kimi K2: Open Agentic Intelligence July 28, 2025
Large Scale Reinforcement Learning on Base Model
LeetCode Competition Benchmark
Leiden Algorithm For Text
Length Normalized Preference Optimization
- Tulu 3: Pushing Frontiers in Open Language Model Post-Training November 22, 2024
Logit Soft Capping
Long Context Adaptation
- Gemma 3 Technical Report March 25, 2025
- DeepSeek-Coder: When the Large Language Model Meets Programming - The Rise of Code Intelligence January 25, 2024
Long Context Retrieval Optimization
- Mixtral of Experts January 8, 2024
Low Rank Key Value Joint Compression
Map Reduce Summarization
Memory Efficient Attention
- LLaMA: Open and Efficient Foundation Language Models February 27, 2023
Mixture of Experts Architecture
Mixture of Experts Sparsity Scaling Law
- Kimi K2: Open Agentic Intelligence July 28, 2025
Model Merging through Weight Averaging
Modularity Based Retrieval
Multi Dimensional Scaling Laws
- Mistral 7B October 10, 2023
Multi Head Latent Attention
- DeepSeek-V3 Technical Report December 27, 2024
- DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model May 7, 2024
Multi Level Abstraction Retrieval
Multi Level Community Indexing
Multi Level Load Balancing
Multi Stage Post Training Recipe
- Tulu 3: Pushing Frontiers in Open Language Model Post-Training November 22, 2024
- Gemma 2: Improving Open Language Models at a Practical Size July 31, 2024
- The Llama 3 Herd of Models July 31, 2024
Multi Stage Reinforcement Learning with Self Critique
- Kimi K2: Open Agentic Intelligence July 28, 2025
- DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning January 22, 2025
Multi Step Learning Rate Scheduler
Multi Token Prediction
- DeepSeek-V3 Technical Report December 27, 2024
Multilingual Multimodal Understanding
- The Llama 3 Herd of Models July 31, 2024
Multilingual Performance Scaling
- Mixtral of Experts January 8, 2024
Multimodal Knowledge Distillation
- Gemma 3 Technical Report March 25, 2025
- The Llama 3 Herd of Models July 31, 2024
MuonClip Optimizer
- Kimi K2: Open Agentic Intelligence July 28, 2025
Node Limited Routing
- DeepSeek-V3 Technical Report December 27, 2024
Non Embedding FLOPs per Token
Open Foundation and Fine Tuned Chat Models
- Llama 2: Open Foundation and Fine-Tuned Chat Models July 18, 2023
Open Language Model Evaluation System
- Tulu 3: Pushing Frontiers in Open Language Model Post-Training November 22, 2024
Open and Efficient Foundation Language Models
- LLaMA: Open and Efficient Foundation Language Models February 27, 2023
Optimal Model Data Scaling Allocation
Pan & Scan Image Processing
- Gemma 3 Technical Report March 25, 2025
Parameter Efficient Language Model Scaling
Performance Training Inference Tradeoff
- Mistral 7B October 10, 2023
Persona Driven Data Synthesis
- Tulu 3: Pushing Frontiers in Open Language Model Post-Training November 22, 2024
Pre fill and Chunking
- Mistral 7B October 10, 2023
Project Level Code Understanding
Prompt Decontamination
- Tulu 3: Pushing Frontiers in Open Language Model Post-Training November 22, 2024
Proximal Policy Optimization
- Llama 2: Open Foundation and Fine-Tuned Chat Models July 18, 2023
Public Dataset Only Training
- LLaMA: Open and Efficient Foundation Language Models February 27, 2023
QK Clip Attention Stabilization
- Kimi K2: Open Agentic Intelligence July 28, 2025
Quantization Aware Training
- Gemma 3 Technical Report March 25, 2025
Query Focused Summarization
RMSNorm Pre Normalization
- LLaMA: Open and Efficient Foundation Language Models February 27, 2023
RMSNorm Stabilization
Reasoning Oriented Reinforcement Learning
Recursive Abstractive Processing
Recursive Summarization
Red Team Safety Testing
- Llama 2: Open Foundation and Fine-Tuned Chat Models July 18, 2023
Reinforcement Learning with Cold Start
Reinforcement Learning with Human Feedback
- Llama 2: Open Foundation and Fine-Tuned Chat Models July 18, 2023
Reinforcement Learning with Verifiable Rewards
- Tulu 3: Pushing Frontiers in Open Language Model Post-Training November 22, 2024
Rejection Sampling Fine Tuning
- The Llama 3 Herd of Models July 31, 2024
- DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models February 5, 2024
- Llama 2: Open Foundation and Fine-Tuned Chat Models July 18, 2023
Rejection Sampling and Supervised Fine Tuning
Repository Level Data Construction
Repository Level Deduplication
Responsible Open Model Development
RoPE Positional Embedding Extension
- Gemma 3 Technical Report March 25, 2025
Rolling Buffer Cache
- Mistral 7B October 10, 2023
Rotary Positional Embeddings
- LLaMA: Open and Efficient Foundation Language Models February 27, 2023
Routing Network Token Selection
- Mixtral of Experts January 8, 2024
Safety Alignment
- The Llama 3 Herd of Models July 31, 2024
- Llama 2: Open Foundation and Fine-Tuned Chat Models July 18, 2023
Safety Context Distillation
- Llama 2: Open Foundation and Fine-Tuned Chat Models July 18, 2023
Scaling Laws for Large Language Models
- The Llama 3 Herd of Models July 31, 2024
- LLaMA: Open and Efficient Foundation Language Models February 27, 2023
Scaling Open Source Language Models with Longtermism
Self Reflection Content Moderation
- Mistral 7B October 10, 2023
Semantic Similarity Clustering
Shared Expert Isolation
Skill Specific Synthetic Data Generation
- Tulu 3: Pushing Frontiers in Open Language Model Post-Training November 22, 2024
Skill Targeted Model Training
- Tulu 3: Pushing Frontiers in Open Language Model Post-Training November 22, 2024
Sliding Window Attention
- Mistral 7B October 10, 2023
Sliding Window Attention Optimization
- Gemma 3 Technical Report March 25, 2025
- Gemma 2: Improving Open Language Models at a Practical Size July 31, 2024
Sparse Mixture of Experts
Speculative Decoding
- DeepSeek-V3 Technical Report December 27, 2024
Supervised Fine Tuning
SwiGLU Activation Function
- LLaMA: Open and Efficient Foundation Language Models February 27, 2023
Synthetic Data Generation for Mathematics
- The Llama 3 Herd of Models July 31, 2024
Synthetic Data Rephrasing for Token Efficiency
- Kimi K2: Open Agentic Intelligence July 28, 2025
System Message for Multi Turn Consistency
- Llama 2: Open Foundation and Fine-Tuned Chat Models July 18, 2023
System Prompt Guardrails
- Mistral 7B October 10, 2023
Tile Wise Fine Grained Quantization
- DeepSeek-V3 Technical Report December 27, 2024
Token Dropping Strategy
Token Efficient Knowledge Compression
- Mistral 7B October 10, 2023
Tool Use Emergence
- The Llama 3 Herd of Models July 31, 2024
Topological Sorting For Code Learning
Tree Organized Retrieval
Two Expert Token Processing
- Mixtral of Experts January 8, 2024
Ultimate Expert Specialization
Unified Paradigm for Reinforcement Learning
Verifiable Rewards Reinforcement Learning
- Kimi K2: Open Agentic Intelligence July 28, 2025
- DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning January 22, 2025
Vision Encoder Token Condensation
- Gemma 3 Technical Report March 25, 2025