This will be a quick How-To for using the new GPT-4 Vision API.
What is it ?
An API that takes in images and allows you to ask questions about them!
Key Notes From OpenAI
- GPT-4 with vision is not a model that behaves differently from GPT-4, with the small exception of the system prompt we use for the model
- GPT-4 with vision is not a different model that does worse at text tasks because it has vision, it is simply GPT-4 with vision added
- GPT-4 with vision is an augmentative set of capabilities for the model
Cool, How does it work?
You can give GPT-4 a picture by either :
- Use a link from an image online
- Uploading a file locally.
Here are some examples from OpenAI
- Link an image from somewhere online
from openai import OpenAI
client = OpenAI()
response = client.chat.completions.create(
model="gpt-4-vision-preview",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "What’s in this image?"},
{
"type": "image_url",
"image_url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg",
},
],
}
],
max_tokens=300,
)
print(response.choices[0])
- Image from your local computer
import base64
import requests
# OpenAI API Key
api_key = "YOUR_OPENAI_API_KEY"
# Function to encode the image
def encode_image(image_path):
with open(image_path, "rb") as image_file:
return base64.b64encode(image_file.read()).decode('utf-8')
# Path to your image
image_path = "path_to_your_image.jpg"
# Getting the base64 string
base64_image = encode_image(image_path)
headers = {
"Content-Type": "application/json",
"Authorization": f"Bearer {api_key}"
}
payload = {
"model": "gpt-4-vision-preview",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "What’s in this image?"
},
{
"type": "image_url",
"image_url": {
"url": f"data:image/jpeg;base64,{base64_image}"
}
}
]
}
],
"max_tokens": 300
}
response = requests.post("https://api.openai.com/v1/chat/completions", headers=headers, json=payload)
print(response.json())
Link “Kids Home Work Example” YouTube Video
Now you can talk to photos with get like it has eyes!
What else can It do?
- Take a picture of your lunch and GPT-4 vision will tell you whats on the plate!
- Help with Homework 🙂 GPT got your back! Look at the video above for our example. If you want to go from asking whats on the image to answering questions, just ask ” What’s the answers the questions in this image? ”
You can use this as a sibling helping multiple siblings at a time with their homework, where’s Waldo finder or anything else you can think of!
What’s Next ?
We build, ship and have fun! Stay tuned for more updates.