Lessons Learned from using HuggingFace for LLM Inference in Google Colab
My original goal was to set up an llm quickly in a google colab python notebook (because I didn’t want to execute locally + I wanted access to an nvidia gpu quickly for free). I originally was looking at Ollama but its client/server architecture didn’t seem to elegantly nicely with google colab. Side note, see this great blog post for an understanding of Ollama’s architecture.