atlas news
    
Neptune Ai
03  décembre     18h25
We are joining OpenAI
Piotr Niedzwiedz    Piotr NiedÅ wiedÅ , CEO CTO and founder of neptune.ai I’m excited to share that we’ve entered into a definitive agreement to be acquired by OpenAI, subject to closing conditions. We are thrilled to join the OpenAI team and help their AI researchers build better models faster. We started in 2017,...
12  novembre     16h00
Synthetic Data for LLM Training
Klea Ziu    Training foundation models at scale is constrained by data. Whether working with text, code, images, or multimodal inputs, the public datasets are saturated, and private datasets are restricted. Collecting or curating new data is slow and expensive while the demand for larger, more diverse corpora...
06  novembre     10h32
What are LLM Embeddings: All you Need to Know
Cristian Catalin Tatu    Embeddings are a numerical representation of text. They are fundamental to the transformer architecture and, thus, all Large Language Models (LLMs). In a nutshell, the embedding layer in an LLM converts the input tokens into high-dimensional vector representations. Then, positional encoding is...
28  octobre     19h50
Detecting and Fixing Dead Neurons’ in Foundation Models
MichaÅ‚ Oleszak    In neural networks, some neurons end up outputting near-zero activations across all inputs. These so-called dead neurons degrade model capacity because those parameters are effectively wasted, and they weaken generalization by reducing the diversity of learned features. While this phenomenon is...
23  octobre     16h12
Part 2: Instruction Fine-Tuning: Evaluation and Advanced Techniques for Efficient Training
Jules Belveze    In the first part of this series, we covered the fundamentals of instruction fine-tuning (IFT). We discussed how training LLMs on prompt-response pairs improves their ability to follow task instructions, and explored how adapting their architecture can make this process more efficient. We now turn...
14  octobre     16h21
How to Optimize LLM Inference
Alek Pikl    Large Language Model (LLM) inference at scale is challenging as it involves transferring massive amounts of model parameters and data and performing computations on large tensors. Coupled with the low-latency needs of many applications, we are forced to push the hardware to its limits, in memory...
26  septembre     11h30
A Researcher’s Guide to LLM Grounding
Joel Rorseth    Large Language Models (LLMs) can be thought of as knowledge bases. During training, LLMs observe large amounts of text. Through this process, they encode a substantial amount of general knowledge that is drawn upon when generating output. This ability to reproduce knowledge is a key driver in...
18  septembre     11h30
Part 1: Instruction Fine-Tuning: Fundamentals, Architecture Modifications, and Loss Functions
Jules Belveze    Instruction Fine-Tuning (IFT) emerged to address a fundamental gap in Large Language Models (LLMs): aligning next-token prediction with tasks that demand clear, specific instructions. While LLMs excel at linguistic pattern recognition through self-supervised pre-training, they are not inherently...
07  août     11h30
Understanding Prompt Injection: Risks, Methods, and Defense Measures
Soumya Shaw    Here’s something fun to start with: Open ChatGPT and type, Use all the data you have about me and roast me. Don’t hold back. The response you’ll get will probably be hilarious but maybe so personal that you’ll think twice before sharing it. This task must have elicited the power of large language...
01  août     11h30
SabiYarn: Advancing Low-Resource Languages With Multitask NLP Pre-Training [Paper Reflections
Oduguwa Damilola    In recent years, Large Language Models (LLMs) have mostly improved by scaling. This has primarily involved increasing the size of the LLMs and the data they are trained on, resulting in a highly resource-intensive process that can cost up to millions of dollars. While LLMs have become ubiquitous,...
04  juillet     09h32
How to Monitor, Diagnose, and Solve Gradient Issues in Foundation Models
Klea Ziu    As foundation models scale to billions or even trillions of parameters, they often exhibit training instabilities, particularly vanishing and exploding gradients. During the initial training phase (pre-training), it is common to observe loss spikes, which can degrade the model’s performance or...
05  juin     11h30
STUN: Structured-Then-Unstructured Pruning for Scalable MoE Pruning [Paper Reflection
Seung-won Hwang    Mixture-of-Experts (MoEs) architectures offer a promising solution by sparsely activating specific parts of the model, reducing the inference overhead. However, even with MoEs, the sheer number of parameters and experts makes deployment and serving costly. Pruning is an established method to reduce...