smcleod.net 👋

The personal blog of Sam McLeod. I write about AI, DevOps, Platform Engineering, and other geeky topics.

Biased LLM Outputs, Tiananmen Square & Americanisations

R̶e̶a̶l̶i̶z̶i̶n̶g̶ Realising bias in LLMs is important, but it goes both ways. I see a vast number of people wasting a lot more of their time correcting the written English (spelling) in almost every single output generated from American trained LLMs - than I do correcting Chinese trained LLMs on what happened at Tiananmen Square. Just as we need to have systems to augment and verify LLMs knowledge with facts - it would be pretty nice to not have to replace Zs with Ss in the output of every single models output. ...

Bringing K/V Context Quantisation to Ollama

Explaining the concept of K/V context cache quantisation, why it matters and the journey to integrate it into Ollama. Why K/V Context Cache Quantisation Matters The introduction of K/V context cache quantisation in Ollama is significant, offering users a range of benefits: • Run Larger Models: With reduced VRAM demands, users can now run larger, more powerful models on their existing hardware. • Expand Context Sizes: Larger context sizes allow LLMs to consider more information, leading to potentially more comprehensive and nuanced responses. For tasks like coding, where longer context windows are beneficial, K/V quantisation can be a game-changer. • Reduce Hardware Utilisation: Freeing up memory or allowing users to run LLMs closer to the limits of their hardware. Running the K/V context cache at Q8_0 quantisation effectively halves the VRAM required for the context compared to the default F16 with minimal quality impact on the generated outputs, while Q4_0 cuts it down to just one third (at the cost of some noticeable quality reduction). ...

Will AI Take My Job?

TLDR; Maybe. Longer answer: Capitalism sucks, hard. It’s the bad card we’ve been dealt, for-profit companies will always be looking to reduce costs and increase profits. If this means they can reduce expenses by automating an activity - they will (eventually). To look at this another way - if the output of your job is repeatable and not creative - one could argue that (other than providing you income), it might also not be the best use of your time. ...

Generating Diagrams with with AI / LLMs

The two tools I use with AI / LLMs for generating diagrams are Mermaid and Excalidraw. Mermaid (or MermaidJS) is a popular diagramming library and format supported by many tools and is often rendered inside markdown (e.g. in a readme.md). Excalidraw is an excellent, free and open source diagramming and visualisation tool. I also often make use of a third party Obsidian plugin for Excalidraw. Excalidraw It has a ‘generate diagram with AI’ feature which if you’re using the excalidraw.com online editor offers a few free generations each day (I think this uses a low-end OpenAI model). If you’re running Excalidraw locally or by using the brilliant Obsidian plugin - you can provide any OpenAI compatible API endpoint and model for AI generations. Behind the scenes Excalidraw AI generates and then renders MermaidJS. ...

Ingest: Streamlining Content Preparation for LLMs

Ingest is a tool I’ve written to make my life easier when preparing content for LLMs. It parses directories of plain text files, such as source code, documentation etc… into a single markdown file suitable for ingestion by AI/LLMs. Ingest can also estimate vRAM requirements for a given model, quantisation and context length: Features Traverse directory structures and generate a tree view Include/exclude files based on glob patterns Estimate vRAM requirements and check model compatibility using another package I’ve created called quantest Parse output directly to LLMs such as Ollama or any OpenAI compatible API for processing Generate and include git diffs and logs Count approximate tokens for LLM compatibility Customisable output templates Copy output to clipboard (when available) Export to file or print to console Optional JSON output Ingest Intro (“Podcast” Episode): ...

LLM Parameter Playground

Here’s a fun little tool I’ve been hacking on to explore the effects of different inference parameters on LLMs. You can find the code and instructions for running it locally on GitHub. It started as a fork of rooben-me’s tone-changer-open, which itself was a “fork” of Figma’s tone generator, I’ve made quite a few changes to make it more focused on local LLMs and advanced parameter exploration.

Code, Chaos, and Copilots (AI/LLM Talk July 2024)

Code, Chaos, and Copilots is a talk I gave in July 2024 as an intro to how I use AI/LLMs to augment my capabilities every day. What I use AI/LLMs for Prompting tips Codegen workflow Picking the right models Model formats Context windows Quantisation Model servers Inference parameters Clients & tools Getting started cheat-sheets Download Slide Deck Disclaimer I’m not a ML Engineer or data scientist, As such, the information presented here is based on my understanding of the subject and may not be 100% accurate or complete. ...

Understanding AI/LLM Quantisation Through Interactive Visualisations

AI models (“LLMs” in this case) have inherently large sizes and computational requirements that often pose challenges for deployment and use. ...

Rating AI Tools

I spend a lot of time working with AI tools and often get asked for recommendations. The following is a list of some of the more notable tools I’ve tried and rated them based on my experience. Just because something has a low rating doesn’t mean it’s bad - it just means it didn’t work well for me and I wouldn’t personally recommend it. I plan on keeping this updated as I try new tools (we’ll see how that goes). ...

Gollama: Ollama Model Manager

Gollama on Github Gollama is a client for Ollama for managing models. It provides a TUI for listing, filtering, sorting, selecting, inspecting (coming soon!) and deleting models and can link Ollama models to LM-Studio. The project started off as a rewrite of my llamalink project, but I decided to expand it to include more features and make it more user-friendly. ...