Tags
Accuracy Recovery Adapters
- Apple Intelligence Foundation Language Models July 29, 2024
Alternating Local Global Attention
Auxiliary Loss Free Load Balancing
- DeepSeek-V3 Technical Report December 27, 2024
Auxiliary Loss for Load Balance
Block Diagonal Attention Masking
- Pixtral 12B October 9, 2024
Code Training Benefits Mathematical Reasoning
Collapsed Tree Retrieval
CommonCrawl Quality Filtering
- LLaMA: Open and Efficient Foundation Language Models February 27, 2023
Community Based Summarization
Community Detection For RAG
Computation Communication Overlap
- LLaMA: Open and Efficient Foundation Language Models February 27, 2023
Compute Optimal Scaling
Context Length Scaling
Contextual Flexibility Retrieval
Cross File Code Completion
Cross File Dependency Analysis
Cross Modal Reasoning Capabilities
- Gemini: A Family of Highly Capable Multimodal Models December 19, 2023
Decoupled Rotary Position Embedding
DeepSeekMath Corpus
DeepSeekMoE
Dependency Aware Training
Dependency Aware Tree Traversal
Depth Over Width Scaling
Device Limited Routing
Direct Preference Optimization
- Tulu 3: Pushing Frontiers in Open Language Model Post-Training November 22, 2024
- The Llama 3 Herd of Models July 31, 2024
- DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models February 5, 2024
- DeepSeek LLM: Scaling Open-Source Language Models with Longtermism January 5, 2024
Distillation of Reasoning Capability
- DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning January 22, 2025
- DeepSeek-V3 Technical Report December 27, 2024
Dynamic Expert Routing
- Mixtral of Experts January 8, 2024
Economical Training
- DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models January 11, 2024
- LLaMA: Open and Efficient Foundation Language Models February 27, 2023
Efficient Cross Node All to All Communication
- DeepSeek-V3 Technical Report December 27, 2024
Efficient Encoder Pretraining
Efficient Inference
Efficient Inference with Reduced Active Parameters
- Mixtral of Experts January 8, 2024
Efficient Long Context Attention Mechanism
Efficient Long Context Encoder
Efficient Model Training
Efficient Transformer Architecture
- The Llama 3 Herd of Models July 31, 2024
- DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models January 11, 2024
- Mistral 7B October 10, 2023
Enhanced Safety Alignment
Entity Knowledge Graph Extraction
Evaluation Framework
- Tulu 3: Pushing Frontiers in Open Language Model Post-Training November 22, 2024
Expert Parallelism
Expert Selection Locality Analysis
- Mixtral of Experts January 8, 2024
Explicit Prompt Engineering
- Pixtral 12B October 9, 2024
FP8 Training
- DeepSeek-V3 Technical Report December 27, 2024
Fill in the Middle Code Completion
Fine Grained Expert Segmentation
Flash Attention Integration
Flexible Image Processing
- Gemini: A Family of Highly Capable Multimodal Models December 19, 2023
Flexible Vision Encoder Architecture
- Gemma 3 Technical Report March 25, 2025
- Pixtral 12B October 9, 2024
Gaussian Mixture Model Clustering
GeGLU Activation Improvement
Ghost Attention
- Llama 2: Open Foundation and Fine-Tuned Chat Models July 18, 2023
Global Context Understanding
Graph RAG
Group Relative Policy Optimization
Grouped Query Attention
- Gemma 3 Technical Report March 25, 2025
- Gemma 2: Improving Open Language Models at a Practical Size July 31, 2024
- The Llama 3 Herd of Models July 31, 2024
- Apple Intelligence Foundation Language Models July 29, 2024
- DeepSeek LLM: Scaling Open-Source Language Models with Longtermism January 5, 2024
- Mistral 7B October 10, 2023
- Llama 2: Open Foundation and Fine-Tuned Chat Models July 18, 2023
Hardware Aware Model Design
Hierarchical Summarization
Instruction Fine Tuning with Direct Preference Optimization
- Mixtral of Experts January 8, 2024
Interleaved Sequence Processing
- Gemini: A Family of Highly Capable Multimodal Models December 19, 2023
Interleaving Local Global Attention
Iterative Reinforcement Learning
Iterative Teaching Committee
- Apple Intelligence Foundation Language Models July 29, 2024
Joint Multimodal Pre Training
- Gemini: A Family of Highly Capable Multimodal Models December 19, 2023
Knowledge Distillation for Small Language Models
Language Model Scaling Laws
Large Scale Reinforcement Learning on Base Model
LeetCode Competition Benchmark
Leiden Algorithm For Text
Logit Soft Capping
Long Context Adaptation
- Gemma 3 Technical Report March 25, 2025
- DeepSeek-Coder: When the Large Language Model Meets Programming - The Rise of Code Intelligence January 25, 2024
Long Context Retrieval Optimization
- Mixtral of Experts January 8, 2024
Low Rank Key Value Joint Compression
MM MT Bench Benchmark
- Pixtral 12B October 9, 2024
Map Reduce Summarization
Memory Efficient Attention
- LLaMA: Open and Efficient Foundation Language Models February 27, 2023
Mirror Descent with Leave One Out Estimation
- Apple Intelligence Foundation Language Models July 29, 2024
Mixed Precision Quantization
- Apple Intelligence Foundation Language Models July 29, 2024
Mixture of Experts Architecture
Model Merging through Weight Averaging
Modularity Based Retrieval
Multi Dimensional Scaling Laws
- Mistral 7B October 10, 2023
Multi Head Latent Attention
Multi Image Instruction Following
- Pixtral 12B October 9, 2024
Multi Level Abstraction Retrieval
Multi Level Community Indexing
Multi Level Load Balancing
Multi Step Learning Rate Scheduler
Multi Token Prediction
- DeepSeek-V3 Technical Report December 27, 2024
Multi Turn Instruction Tuning
Multilingual Performance Scaling
- Mixtral of Experts January 8, 2024
Multimodal Instruction Tuning
- Pixtral 12B October 9, 2024
Multimodal Knowledge Distillation
- Gemma 3 Technical Report March 25, 2025
- Gemini: A Family of Highly Capable Multimodal Models December 19, 2023
Multimodal Safety Evaluation Framework
- Gemma 3 Technical Report March 25, 2025
- Gemini: A Family of Highly Capable Multimodal Models December 19, 2023
Natively Multimodal Transformer Architecture
- Gemini: A Family of Highly Capable Multimodal Models December 19, 2023
Non Embedding FLOPs per Token
Open Foundation Language Models
- The Llama 3 Herd of Models July 31, 2024
Open Foundation and Fine Tuned Chat Models
- Llama 2: Open Foundation and Fine-Tuned Chat Models July 18, 2023
Open and Efficient Foundation Language Models
- LLaMA: Open and Efficient Foundation Language Models February 27, 2023
Optimal Model Data Scaling Allocation
Optimal Model/Data Scaling Up Allocation
- The Llama 3 Herd of Models July 31, 2024
Pan & Scan Image Processing
- Gemma 3 Technical Report March 25, 2025
Parameter Efficient Language Model Scaling
Performance Training Inference Tradeoff
- Mistral 7B October 10, 2023
Post Training Multimodal Alignment
- Gemini: A Family of Highly Capable Multimodal Models December 19, 2023
Pre fill and Chunking
- Mistral 7B October 10, 2023
Project Level Code Understanding
Prompt Decontamination
- Tulu 3: Pushing Frontiers in Open Language Model Post-Training November 22, 2024
- The Llama 3 Herd of Models July 31, 2024
Proximal Policy Optimization
- Llama 2: Open Foundation and Fine-Tuned Chat Models July 18, 2023
Public Dataset Only Training
- LLaMA: Open and Efficient Foundation Language Models February 27, 2023
Quantization Aware Training
- Gemma 3 Technical Report March 25, 2025
Query Focused Summarization
RMSNorm Pre Normalization
- LLaMA: Open and Efficient Foundation Language Models February 27, 2023
RMSNorm Stabilization
Reasoning Oriented Reinforcement Learning
Recursive Abstractive Processing
Recursive Summarization
Red Team Safety Testing
- Llama 2: Open Foundation and Fine-Tuned Chat Models July 18, 2023
Reinforcement Learning with Cold Start
Reinforcement Learning with Human Feedback
- Llama 2: Open Foundation and Fine-Tuned Chat Models July 18, 2023
Reinforcement Learning with Verifiable Rewards
- Tulu 3: Pushing Frontiers in Open Language Model Post-Training November 22, 2024
Rejection Sampling Fine Tuning
Rejection Sampling and Supervised Fine Tuning
Repository Level Data Construction
Repository Level Deduplication
Responsible AI Evaluation
- Apple Intelligence Foundation Language Models July 29, 2024
Responsible AI Principles
- Apple Intelligence Foundation Language Models July 29, 2024
Responsible Multimodal Model Development
- Gemini: A Family of Highly Capable Multimodal Models December 19, 2023
Responsible Open Model Development
Retrieval Augmented Generation
- The Llama 3 Herd of Models July 31, 2024
RoPE 2D Positional Encoding
- Pixtral 12B October 9, 2024
RoPE Positional Embedding Extension
Rolling Buffer Cache
- Mistral 7B October 10, 2023
Rotary Positional Embeddings
- LLaMA: Open and Efficient Foundation Language Models February 27, 2023
Routing Network Token Selection
- Mixtral of Experts January 8, 2024
Runtime Swappable Model Adapters
- Apple Intelligence Foundation Language Models July 29, 2024
Safety Alignment
- The Llama 3 Herd of Models July 31, 2024
- Llama 2: Open Foundation and Fine-Tuned Chat Models July 18, 2023
Safety Context Distillation
- Llama 2: Open Foundation and Fine-Tuned Chat Models July 18, 2023
Scaling Laws for Large Language Models
- The Llama 3 Herd of Models July 31, 2024
- LLaMA: Open and Efficient Foundation Language Models February 27, 2023
Scaling Open Source Language Models with Longtermism
Self Reflection Content Moderation
- Mistral 7B October 10, 2023
Semantic Similarity Clustering
Sequence Packing Optimization
Shared Expert Isolation
Sliding Window Attention
- Mistral 7B October 10, 2023
Sliding Window Attention Optimization
Soft Label Reward Modeling
- Apple Intelligence Foundation Language Models July 29, 2024
Sparse Computation Mechanism
Sparse Mixture of Experts
- DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models January 11, 2024
- Mixtral of Experts January 8, 2024
Standardized Multimodal Evaluation
- Pixtral 12B October 9, 2024
Supervised Fine Tuning
Supervised Finetuning
- Tulu 3: Pushing Frontiers in Open Language Model Post-Training November 22, 2024
- The Llama 3 Herd of Models July 31, 2024
SwiGLU Activation Function
- LLaMA: Open and Efficient Foundation Language Models February 27, 2023
Synthetic Data Generation for Mathematics
- Apple Intelligence Foundation Language Models July 29, 2024
System Message for Multi Turn Consistency
- Llama 2: Open Foundation and Fine-Tuned Chat Models July 18, 2023
System Prompt Guardrails
- Mistral 7B October 10, 2023
Token Dropping Strategy
Token Efficient Knowledge Compression
- Mistral 7B October 10, 2023
Topological Sorting For Code Learning
Tree Organized Retrieval
Two Expert Token Processing
- Mixtral of Experts January 8, 2024
Ultimate Expert Specialization
Uncertainty Routed Multimodal Reasoning
- Gemini: A Family of Highly Capable Multimodal Models December 19, 2023
Unpadding Transformer Architecture
Variable Image Resolution Processing
- Pixtral 12B October 9, 2024
Vision Encoder Token Condensation
- Gemma 3 Technical Report March 25, 2025
Vision Encoder with Break Tokens
- Pixtral 12B October 9, 2024