Take Your Coding to the Next Level with a Local Copilot
Take your coding to the next level with a local copilot. Discover how to use LMStudio and Olama to serve Llama3 models within VS Code's Code GPT extension for enhanced programming capabilities.
February 17, 2025

Unlock your coding potential with a free local AI copilot that takes your productivity to new heights. Discover how to seamlessly integrate powerful language models like Llama3 into your development workflow, empowering you to write better code, refactor efficiently, and boost your overall coding experience.
Learn How to Set Up a Local Co-Pilot for Your Coding Needs
Leverage LM Studio to Serve Llama3 Models Locally
Discover the Power of Olama as an Open-Source Local Co-Pilot Solution
Conclusion
Learn How to Set Up a Local Co-Pilot for Your Coding Needs
Learn How to Set Up a Local Co-Pilot for Your Coding Needs
In this section, we will explore how to set up a local co-pilot for your coding needs using LM Studio and Olama. We will cover the steps to install the necessary extensions, configure the local servers, and leverage the power of Llama3 models to enhance your coding experience.
First, we will focus on setting up LM Studio as an API server to serve the Llama3 Instruct Gradient 1 million token version model. We will guide you through the process of loading the model, creating a local server, and integrating it with the Code GPT extension in Visual Studio Code.
Next, we will introduce Olama as an open-source alternative to LM Studio. We will demonstrate how to download and install Olama, start the server, and connect the Llama3 70 billion model to the Code GPT extension. This will provide you with a fully open-source solution for your local co-pilot needs.
Throughout the section, we will test the capabilities of both LM Studio and Olama by providing prompts and observing the responses from the Llama3 models. We will also explore the refactoring capabilities of the larger 70 billion model and compare the performance to the 8 billion model used earlier.
By the end of this section, you will have a solid understanding of how to set up a local co-pilot using both LM Studio and Olama, enabling you to leverage the power of Llama3 models for your coding tasks and projects.
Leverage LM Studio to Serve Llama3 Models Locally
Leverage LM Studio to Serve Llama3 Models Locally
To use Llama3 as your co-pilot in VS Code, you can leverage LM Studio to serve the Llama3 models locally. This approach allows you to run the models on your own machine, without relying on an external API like Grok.
First, install the Code GPT extension in VS Code. Then, follow these steps:
- Download and run LM Studio on your machine.
- Search for the Llama3 model you want to use, such as the Llama3 Instruct Gradient 1 million token version.
- Create a local server in LM Studio to serve the selected Llama3 model.
- In VS Code, make sure to select LM Studio as the provider in the Code GPT extension settings.
Now, you can test the integration by asking the Llama3 model to write a Python program that downloads a file from S3 and stores it locally. The model will communicate with the LM Studio server to generate the response.
While the speed may not be as fast as using the Grok API, this approach allows you to run the models locally, without relying on an external service. Additionally, you can explore other models available in LM Studio and use them as your coding co-pilot within VS Code.
Discover the Power of Olama as an Open-Source Local Co-Pilot Solution
Discover the Power of Olama as an Open-Source Local Co-Pilot Solution
To use Olama as your co-pilot within the Code GPT extension, follow these steps:
- Download and install Olama from the official website, olama.com.
- Start the Olama server by clicking on the Olama application.
- In the Code GPT extension, select Olama as the provider.
- Specify the model you want to use, such as the Llama3 70 billion model.
- To start the Llama3 70 billion model server, open a terminal and run the command
olama run llama3-70b
. - Once the model is loaded, you can start using Olama as your co-pilot within the Code GPT extension.
Olama is a completely open-source solution, unlike LM Studio, which has some proprietary components. While LM Studio offers more flexibility in terms of the models you can use, Olama provides a fully open-source alternative.
When using Olama, you'll need to manually start the model server, which can be a bit more involved than the LM Studio setup. However, this approach allows you to have full control over the model you're using and ensures that your co-pilot solution is entirely open-source.
The quality of the output from Olama will depend on the model you're using, and the 70 billion Llama3 model should provide better performance compared to the 8 billion model. Keep in mind that running a large model locally may result in slower inference speeds compared to using a cloud-based API like Grok.
Overall, Olama is a great open-source option for running your co-pilot locally, and it can be a valuable tool in your development workflow.
Conclusion
Conclusion
In this video, we explored two local alternatives to the Grok API for using Llama3 as a coding co-pilot within VS Code. We first set up LM Studio as an API server to serve the Llama3 Instruct Gradient 1 million token model. We then demonstrated how to use this model within the Code GPT extension in VS Code, showcasing its ability to generate code and provide refactoring suggestions.
Next, we looked at using Olama as an open-source solution for running local language models. We walked through the process of starting the Olama server and connecting the Code GPT extension to the Llama3 70 billion parameter model. While the performance was slower compared to the Grok API, the local setup provided more control and flexibility.
The video highlighted the trade-offs between the two approaches, with LM Studio offering a wider range of model options but being a closed-source solution, while Olama provided an open-source alternative with a more limited model selection. Ultimately, both approaches demonstrated the ability to leverage powerful language models for coding assistance in a local environment, reducing the reliance on external APIs.
FAQ
FAQ