How to run AI Models locally with Ollama on Windows - GabrielMCS

Ollama is a powerful open-source tool for running large language models (LLMs) locally which can be crucial for sensitive information. It is substantially easier to deploy, and with only one installer and a few commands, you can have your AI instance running, negating the need for complex configuration files or deployment procedures, it also includes GPU acceleration if you have a recent dedicated GPU (For NVIDIA – GPU with Compute Capability 5.0 or higher, for AMD – RX 6800 or higher or more recent GPU).

With his recent updates, it can now be installed natively on Windows without requiring WSL, ensuring better performance. This guide will walk you through the step-by-step installation process and explain how to customize it for your needs, I will also include a frontend website and app to allow for file uploads and easier operation.

Note: You should have at least 8GB of RAM available to run 7B models or lower, 16GB to run 13B/14B models, 32GB to run the 33B models and the list goes on… the higher the parameters that the model has the more resources your machine will need for it to run, this does not include running it smooth and fast (I will touch on this point later).

Download and Installation

The installation process is straightforward and supports multiple operating systems, In this guide we will only focus on Windows.

You can obtain the installation package from the official website or alternatively from GitHub where you should download from the most recent release (OllamaSetup.exe).

After the download, you can run the installer, follow the instructions in the wizard, and click on install. Once the installation process is complete, the installer will close automatically and Ollama will now run in the background, it can be found in the system tray on the right side of the taskbar by default (you can change this behavior on Task Manager to initialize only when you open the app instead of when starting Windows).

Customize model storage location and env variables (Optional)

This section is optional but can help you change settings that may be beneficial in specific situations. After modifying any of these settings, you should restart the app.

If your primary drive has limited space, or if you prefer to install and run models (which can be quite large) on a different disk, you can change the default storage location for models. To do this, add the environment variable `OLLAMA_MODELS` and set it to your chosen directory.

First, quit Ollama and search for “Edit the system environment variables” on Windows. Click on the button for “Environment Variables.” Then, edit or create a new variable for your user account called `OLLAMA_MODELS`, pointing it to the desired storage location (for example, `D:\Models`). Click “OK” to save your changes.

Here are other commonly available settings:

OLLAMA_HOST: This defines the network address that the Ollama service listens on. The default is `127.0.0.1` (localhost). If you want to allow other computers on your local network to access Ollama, you can set this to `0.0.0.0`.

OLLAMA_PORT: This is the default port that the Ollama service listens on, which is set to `11434` by default. If there is a port conflict, you can change it to another port (e.g., 8080).

OLLAMA_ORIGINS: This is a comma-separated list of HTTP client request origins. If you are using it locally without strict requirements, you can set it to an asterisk (*) to indicate no restrictions.

OLLAMA_KEEP_ALIVE: This setting determines how long a large model remains in memory. The default is 5 minutes (5m). For example, a number like 300 means 300 seconds; 0 means the model is unloaded immediately after processing the request, while a negative number indicates that it stays loaded indefinitely. You can set it to 24h to keep the model in memory for 24 hours, which improves access speed.

OLLAMA_NUM_PARALLEL: This controls the number of concurrent request handlers. By default, it is set to 1, which means requests are processed serially. You can adjust this based on your actual needs.

OLLAMA_MAX_QUEUE: This defines the length of the request queue, with a default setting of 512. Requests that exceed this length will be discarded. You may want to adjust this according to your situation.

OLLAMA_DEBUG: This flag controls the output of debug logs. Set it to 1 to enable detailed log information, useful for troubleshooting issues.

OLLAMA_MAX_LOADED_MODELS: This determines the maximum number of models that can be loaded into memory at the same time. The default is set to 1, meaning only one model can be in memory at any given time.

Verify the Installation

To confirm that Ollama was installed successfully, open Command Prompt (cmd) or PowerShell and run:

Ollama --version

If the command returns a version number, Ollama is ready to use.

Installing and running models

You can find the list of models provided by Ollama at https://ollama.com/library.

Here are some example models that can be used.

Model	Parameters	Size	Command
deepseek-r1	14B	9GB	ollama run deepseek-r1:14b
llama3.2	3B	2GB	ollama run llama3.2:3b
llama3.1	8B	4.9GB	ollama run llama3.1:8b
mistral	7B	4.1GB	ollama run mistral:7b
qwen2.5	14B	9GB	ollama run qwen2.5:14b

Of course, there are many other models for different areas and variations within these models, for example, deepseek-r1, as I’m currently writing, has models with 1.5b, 7b, 8b, 14b, 32b, 70b, and 671b parameters, these influence how resource intensive the model is but also may improve drastically the quality of the responses.

Operation Commands

Command	Description
ollama serve	Start ollama
ollama create	Create a model from a modelfile
ollama show	Show information for a model
ollama run	Run a model
ollama pull	Pull a model from a registry, can also be used to update a local model
ollama list	List models
ollama ps	List running models, display hardware usage
ollama cp	Copy a model
ollama rm	Remove a model
ollama help	Help about any command

Application Logs

To troubleshoot errors with Ollama, check the application logs by navigating to the Ollama installation folder or by entering the command explorer %LOCALAPPDATA%\Ollama. You can also set the environment variable OLLAMA_DEBUG to 1 to get more detailed log information.

GPU Acceleration - CUDA Toolkit (Optional)

This section is optional for smaller models, such as those below 14b parameters. However, for larger models, it is essential to use a discrete GPU to ensure faster response times and avoid significant delays. If you have an NVIDIA GPU, you will need to install the CUDA Toolkit for optimal performance. To utilize CUDA, ensure that your NVIDIA GPU has a computing capability of 5.0 or higher.

If you have an AMD GPU, there is a list of supported cards similar to NVIDIA, but you do not need to install the CUDA Toolkit, so you can skip this section. Since the list of supported cards is frequently updated, I will provide the link here for you to access the latest information: https://github.com/ollama/ollama/blob/main/docs/gpu.md.

If your Nvidia GPU is on the supported list, you can download the CUDA Toolkit installer. First, select your operating system, version, architecture, and the installer type. The network installer will download the necessary files as needed, while the local installer downloads everything at once and can be executed in an offline environment. Open the downloaded file and follow the instructions provided by the installer. In most cases, you will only need to choose the express installation option, and that’s it.

Chatboxai UI

Using the models from the command prompt can be a bit underwhelming. Instead, you can use an application like ChatboxAI to access all the functionalities of Ollama. You have two options: either install the application or use the web version.

After selecting which version to use, you’ll need to configure the app to connect to the Ollama instance. This can be done easily by opening the settings in the Model tab and selecting the Ollama API, which is already configured with the default URL. If everything is set up correctly, the model should appear in the dropdown menu when you go to select it.

Additionally, you can adjust the temperature setting to manipulate how creative or strict the AI responses are. Once you’ve changed your settings, be sure to click on save.

Conclusion

I hope this guide has helped you learn how to install Ollama and ChatboxAI, which allows you to implement local/custom versions of AI models on your machine.

However, don’t limit yourself to just this, Ollama offers much more. You can create your custom models, share and serve them online, or even develop your own online service, the sky is the limit and I encourage you to explore and learn how to make the most of it.

I may create aditional guides on these topics if there’s enough interest. For now, this guide covers the most basic uses with some customization options.

Download and Installation

Customize model storage location and env variables (Optional)

Verify the Installation

Installing and running models

Operation Commands

Application Logs

GPU Acceleration - CUDA Toolkit (Optional)

Chatboxai UI

Conclusion

Leave a Comment Cancel Reply