Knowledge distillation Archives

How to train LLMs with knowledge distillation

30. November 202330. November 2023Martin Treiber 1676 views 5 minutes

Knowledge distillation is an area of research into more efficient Transformers which trains small models (students) by encouraging them to reproduce the outputs of large models (teachers). This is a technique which initially gained popularity on classification tasks in computer vision, but has been successfully applied in several domains, including LLMs. If you start from...