Webinar: Double LLM Inference Speeds with Speculative Decoding
Machine Learning Mastery sent this email to their subscribers on May 20, 2024.
151 Calle de San Francisco, Suite 200 - PMB 5072, San Juan, PR 00901
Predibase
2X LLM INFERENCE SPEEDS WITH s SPECULATIVE DECODING = v Thursday, May 23, 2024 10:00 am PT / 1:80 pm ET L - Arnav Garg Ve
May 23rd at 10:00am PT
Join us on Thursday, May 23rd at 10:00 am PT to learn how fine-tuning an open-source LLM with speculative decoding typically
increases inference throughput by more than 2x without sacrificing performance.
We’re excited to bring Medusa into the Predibase platform in our next release and invite you to join our upcoming webinar for an
early preview and technical Q&A with our CTO, Travis Addair, and ML Engineer, Arnav Garg.
We’ll talk through a couple scenarios for using speculative decoding, including:
1. Increasing throughput of an open-source LLM base model by fine-tuning an open-source LLM when you don’t have labeled data
(i.e. only inputs)
2. Increasing throughput and performance of an open-source LLM for task-specific applications by fine-tuning with labeled data
(i.e. input,output pairs)
We’ll also cover key considerations like how to structure fine-tuning jobs to optimize for faster inference and more performant
model generation, plus will open up the conversation for Q&A to make sure teams know how to get started with Medusa on their own
models.
Register Now
151 Calle de San Francisco, Suite 200 - PMB 5072, San Juan, PR 00901
Show all