How To Get Started GPT4-Vision API

This will be a quick How-To for using the new GPT-4 Vision API.

What is it ?

An API that takes in images and allows you to ask questions about them!

Key Notes From OpenAI

  • GPT-4 with vision is not a model that behaves differently from GPT-4, with the small exception of the system prompt we use for the model
  • GPT-4 with vision is not a different model that does worse at text tasks because it has vision, it is simply GPT-4 with vision added
  • GPT-4 with vision is an augmentative set of capabilities for the model

Cool, How does it work?

You can give GPT-4 a picture by either :

  • Use a link from an image online
  • Uploading a file locally.

Here are some examples from OpenAI

  • Link an image from somewhere online
from openai import OpenAI

client = OpenAI()

response = client.chat.completions.create(
            "role": "user",
            "content": [
                {"type": "text", "text": "What’s in this image?"},
                    "type": "image_url",
                    "image_url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg",


  • Image from your local computer
import base64
import requests

# OpenAI API Key

# Function to encode the image
def encode_image(image_path):
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode('utf-8')

# Path to your image
image_path = "path_to_your_image.jpg"

# Getting the base64 string
base64_image = encode_image(image_path)

headers = {
    "Content-Type": "application/json",
    "Authorization": f"Bearer {api_key}"

payload = {
    "model": "gpt-4-vision-preview",
    "messages": [
        "role": "user",
        "content": [
            "type": "text",
            "text": "What’s in this image?"
            "type": "image_url",
            "image_url": {
              "url": f"data:image/jpeg;base64,{base64_image}"
    "max_tokens": 300

response = requests.post("https://api.openai.com/v1/chat/completions", headers=headers, json=payload)


Link “Kids Home Work Example” YouTube Video

Quick 1 Min video on how to use OpenAI’s Vision API w/ Google Colab

Now you can talk to photos with get like it has eyes!

What else can It do?

  • Take a picture of your lunch and GPT-4 vision will tell you whats on the plate!
  • Help with Homework 🙂 GPT got your back! Look at the video above for our example. If you want to go from asking whats on the image to answering questions, just ask ” What’s the answers the questions in this image? ”

You can use this as a sibling helping multiple siblings at a time with their homework, where’s Waldo finder or anything else you can think of!

What’s Next ?

We build, ship and have fun! Stay tuned for more updates.

Leave a Reply

Your email address will not be published. Required fields are marked *