How to run AI Models locally with Ollama on Windows

Ollama is a powerful open-source tool for running large language models (LLMs) locally which can be crucial for sensitive information. It is substantially easier to deploy, and with only one installer and a few commands, you can have your AI instance running, negating the need for complex configuration files or deployment procedures, it also includes GPU acceleration if you have a recent dedicated GPU (For NVIDIA – GPU with Compute Capability 5.0 or higher, for AMD – RX 6800 or higher or more recent GPU). With his recent updates, it can now be installed natively on Windows without requiring WSL, ensuring better performance. This guide will walk you through the step-by-step installation process and explain how to customize it for your needs, I will also include a frontend website and app to allow for file uploads and easier operation. Note: You should have at least 8GB of RAM available to run 7B models or lower, 16GB to run 13B/14B models, 32GB to run the 33B models and the list goes on… the higher the parameters that the model has the more resources your machine will need for it to run, this does not include running it smooth and fast (I will touch on this point later). Download and Installation The installation process is straightforward and supports multiple operating systems, In this guide we will only focus on Windows. You can obtain the installation package from the official website or alternatively from GitHub where you should download from the most recent release (OllamaSetup.exe). After the download, you can run the installer, follow the instructions in the wizard, and click on install. Once the installation process is complete, the installer will close automatically and Ollama will now run in the background, it can be found in the system tray on the right side of the taskbar by default (you can change this behavior on Task Manager to initialize only when you open the app instead of when starting Windows). Customize model storage location and env variables (Optional) This section is optional but can help you change settings that may be beneficial in specific situations. After modifying any of these settings, you should restart the app. If your primary drive has limited space, or if you prefer to install and run models (which can be quite large) on a different disk, you can change the default storage location for models. To do this, add the environment variable `OLLAMA_MODELS` and set it to your chosen directory. First, quit Ollama and search for “Edit the system environment variables” on Windows. Click on the button for “Environment Variables.” Then, edit or create a new variable for your user account called `OLLAMA_MODELS`, pointing it to the desired storage location (for example, `D:Models`). Click “OK” to save your changes. Here are other commonly available settings:   OLLAMA_HOST: This defines the network address that the Ollama service listens on. The default is `127.0.0.1` (localhost). If you want to allow other computers on your local network to access Ollama, you can set this to `0.0.0.0`. OLLAMA_PORT: This is the default port that the Ollama service listens on, which is set to `11434` by default. If there is a port conflict, you can change it to another port (e.g., 8080). OLLAMA_ORIGINS: This is a comma-separated list of HTTP client request origins. If you are using it locally without strict requirements, you can set it to an asterisk (*) to indicate no restrictions. OLLAMA_KEEP_ALIVE: This setting determines how long a large model remains in memory. The default is 5 minutes (5m). For example, a number like 300 means 300 seconds; 0 means the model is unloaded immediately after processing the request, while a negative number indicates that it stays loaded indefinitely. You can set it to 24h to keep the model in memory for 24 hours, which improves access speed. OLLAMA_NUM_PARALLEL: This controls the number of concurrent request handlers. By default, it is set to 1, which means requests are processed serially. You can adjust this based on your actual needs. OLLAMA_MAX_QUEUE: This defines the length of the request queue, with a default setting of 512. Requests that exceed this length will be discarded. You may want to adjust this according to your situation. OLLAMA_DEBUG: This flag controls the output of debug logs. Set it to 1 to enable detailed log information, useful for troubleshooting issues. OLLAMA_MAX_LOADED_MODELS: This determines the maximum number of models that can be loaded into memory at the same time. The default is set to 1, meaning only one model can be in memory at any given time. Verify the Installation To confirm that Ollama was installed successfully, open Command Prompt (cmd) or PowerShell and run: Ollama –version If the command returns a version number, Ollama is ready to use. Installing and running models You can find the list of models provided by Ollama at https://ollama.com/library. Here are some example models that can be used. Model Parameters Size Command deepseek-r1 14B 9GB ollama run deepseek-r1:14b llama3.2 3B 2GB ollama run llama3.2:3b llama3.1 8B 4.9GB ollama run llama3.1:8b mistral 7B 4.1GB ollama run mistral:7b qwen2.5 14B 9GB ollama run qwen2.5:14b Of course, there are many other models for different areas and variations within these models, for example, deepseek-r1, as I’m currently writing, has models with 1.5b, 7b, 8b, 14b, 32b, 70b, and 671b parameters, these influence how resource intensive the model is but also may improve drastically the quality of the responses. Operation Commands Of course, there are many other models for different areas and variations within these models, for example, deepseek-r1, as I’m currently writing, has models with 1.5b, 7b, 8b, 14b, 32b, 70b, and 671b parameters, these influence how resource intensive the model is but also may improve drastically the quality of the responses. Command Description ollama serve Start ollama ollama create Create a model from a modelfile ollama show Show information for a model ollama run Run a model ollama pull Pull a model from a registry, can also be used to update a local model ollama list

How to run AI Models locally with Ollama on Windows Read More »