Using Ollama With Open Source Local Model

Recently we walked through an example of how to extract a transcript from a video using Whisper models from OpenAI. The result of this was a full transcript of a local MP4 video. True story, I knew I was going to do a little writing while on a flight recently so I did some pre-downloading to see if I could then use Ollama with an open source model I had already downloaded (Mistral) to produce a summary of the transcript. This felt like a great example to demonstrate local LLMs and how they can help us leverage the power of AI, even when not connected to the Internet. Over the roar of the plane engine (emergency seats = lots of legroom but oh so loud) I can hear you shouting, get on with it already!!!!!

Setup

Ollama is currently available on Mac & Linux. They have been promising a Windows port and recently released a preview. If you don’t want to try out that preview I’ve got your back, see here where I’ve written about how to get Ollama working on your Windows machine using WSL.

As for the basic setup, the instructions on the Ollama website are pretty straightforward so I won’t go into them (unless you’re on Windows, see above). Let’s have a quick peak and the main commands that Ollama provides (for any of our Docker friends this may seem familiar):

ollama pull — Will fetch the model you specified from the Ollama hub
ollama rm — Removes the specified model from your environment
ollama cp — Makes a copy of the model
ollama list — Lists all the models that you have downloaded or created in your environment
ollama run — Performs multiple tasks. If the model doesn’t exist, it downloads it, runs it and then serves it
ollama serve — Will serve a REST API with access to current models so other applications can use them

I already downloaded it so I’m going to stick with Mistral, which is a great choce despite not having an Internet connection. If you’re starting from scratch, run ollama pull mistral which will pull the latest version of the model.

With the model downloaded let’s test it out, execute ollama run mistral. You should see an input prompt now for you to enter some text, try asking it to tell you a joke. If all has gone well you should be on the ground in stitches at THE FUNNIEST joke you’ve ever heard. No? Comedy tastes are not the purpose of this article so let’s move on.

You now have an environment and the dependencies installed to run a model fully local with an open source LLM, wasn’t that hard really was it? Let’s continue and see if we can extend our last example and have Ollama and Mistral summarize our video transcript.

Interacting from Python

So let’s lay out what we want to do here.

We have a process which can take an MP4 video and transcribe it for us. The text is returned as a block of text
We have a way to run an LLM locally, which is waiting for instructions
Now we need to pass our transcript to the running instance of Mistral and ask it to return a catchy title & summary which describes the content of the video (starting to see some use cases yet??)

So we already executed ollama run mistral which both runs the model and serves it. In most cases this should just work but if like me you’ve been playing around with a lot of these models, you might have a conflict on the default host and port. In that case you can run the following command:

OLLAMA_HOST=127.0.0.1:5050 ollama serve replacing the port number with one that you prefer.

Now with the model being served we need to connect so we can send our transcript and get a summary. For this we’re going to use Langchain to simplify the interface between our application and the LLM.

from langchain.llms.ollama import Ollama

SUMMARY_PROMPT = """
You are a helpful video archive manager who specializes in reviewing and summarizing a large catalog of MP4 videos. You will be provided a transcript for a video and you will do the following:
Identify the theme(s) contained within the transcript to understand what the intention of the video is
Summarize the content into a short paragraph of 100 words at most
Provide a catchy title that highlights the core theme of the video

You MUST wait for a transcript to be provided before trying to perform these instructions. Do not make up random summaries. 
"""

def query(prompt, model):
    ollama = Ollama(model=model)
    response = ollama.invoke(f"CONTEXT:{SUMMARY_PROMPT}\n\nTRANSCRIPT:{prompt}")
    return response

Most of this is self explanatory, what is interesting is the part telling the LLM to “wait for a transcript”. This is interesting because commercial LLMs like ChatGPT put a lot into their own prompt which makes statements like this obsolete but since we’re dealing with an open source model we need to be explicit about what we want it to do. If we don’t, the LLM starts making things up on the spot, which we don’t want.

End to End Video Summary

Ok, moment of truth time. If we run the Python script now we expect the following to happen:

Extract audio from MP4 to it’s own file with moviepy
Create a transcript using Whisper
Execute a prompt which provides instructions to our locally running Mistral that we want a catchy title and summary.
Mistral responds with what we asked for
We are happy and now have a way to quickly summarize many local videos based on their content

import os
import moviepy.editor as mp
import whisper
from langchain.llms.ollama import Ollama

BASE_PATH = os.getcwd()
AUDIO_BASE = f"{BASE_PATH}/audio"
VIDEO_BASE = f"{BASE_PATH}/video"
SUMMARY_PROMPT = """
You are a helpful video archive manager who specializes in reviewing and summarizing a large catalog of MP4 videos. You will be provided a transcript for a video and you will do the following:
Identify the theme(s) contained within the transcript to understand what the intention of the video is
Summarize the content into a short paragraph of 100 words at most
Provide a catchy title that highlights the core theme of the video

You MUST wait for a transcript to be provided before trying to perform these instructions. Do not make up random summaries. 
"""

filename = "test_video"
audio_path = f"{AUDIO_BASE}/{filename}.wav"
video_path = f"{VIDEO_BASE}/{filename}.mp4"

def extract_audio_to_file(video_path, audio_path):
    """
    Uses the moviepy package to extract and write
    audio content to a ne file
    """
    # Load the video from file
    video = mp.VideoFileClip(video_path)

    # Extract the audio file from the video. 
    # The codec is chosen to be a compatible format for Whisper 
    video.audio.write_audiofile(audio_path, codec='pcm_s16le') 


def video_to_transcript_with_whisper(video_path, audio_path):
    extract_audio_to_file(video_path, audio_path)
    
    # First grab the relevant model for the task at hand
    model = whisper.load_model("base")  
    
    # Transcribe the audio file using the selected
    result = model.transcribe(audio_path)
    
    return result["text"]

def query(prompt, model):
    ollama = Ollama(model=model)
    response = ollama.invoke(f"CONTEXT:{SUMMARY_PROMPT}\n\nTRANSCRIPT:{prompt}")
    return response

transcript = video_to_transcript_with_whisper(video_path, audio_path)

# Generate the blog article using the pre-set prompt & transcript
response = query(transcript, "mistral")
print(response)

And here’s what it gives us.

Title: Simplifying Python Projects with Poetry: A Beginner’s Guide

Theme: Technology, Programming (Python)

Summary: In this tutorial-style video, the speaker introduces Poetry as a solution for managing Python projects more efficiently compared to other package management systems. The video walks viewers through creating a new project using Poetry, adding dependencies, and setting up an alias to run the script. By encapsulating all necessary configurations into one file, users can easily install and manage required dependencies without having to worry about various setup files. The tutorial demonstrates installing pandas as an example dependency and emphasizes the benefits of using Poetry in a virtual environment.

Considering the video is me talking about how to set up poetry and why I think it’s better than other package managers I’d say this is a pretty good summary.

Wrapping It Up

There you have it, a very practical example, end to end, which uses Ollama, an open source LLM and other libraries to execute a video summariser 100% locally & offline. Hope you enjoyed this, subscribe and follow along for more examples of how you can implement simple AI to gain some productivity points.

Related Posts

Understanding AI Hallucinations: A Beginner’s Guide

The Looming Crisis of AI Training Data and the Spectre of Model Collapse. What can we do about it?

Whisper & Python for Video Transcription

Unleashing the Power of Ollama on Windows

Setup

Interacting from Python

End to End Video Summary

Wrapping It Up

Daily

Build GraphRAG Using Streamlit, LangChain, Neo4j & GPT-4o

Building Knowledge Graphs with LangChain & Neo4J

Weekly

Build GraphRAG Using Streamlit, LangChain, Neo4j & GPT-4o

Building Knowledge Graphs with LangChain & Neo4J