This example demonstrates how to use the Atomic Agents framework to analyze images with text, specifically focusing on extracting structured information from nutrition labels using GPT-4 Vision capabilities.
- Image Analysis: Process nutrition label images using GPT-4 Vision
- Structured Data Extraction: Convert visual information into structured Pydantic models
- Multi-Image Processing: Analyze multiple nutrition labels simultaneously
- Comprehensive Nutritional Data: Extract detailed nutritional information including:
- Basic nutritional facts (calories, fats, proteins, etc.)
- Serving size information
- Vitamin and mineral content
- Product details
-
Clone the main Atomic Agents repository:
git clone https://github.com/BrainBlend-AI/atomic-agents
-
Navigate to the basic-multimodal directory:
cd atomic-agents/atomic-examples/basic-multimodal -
Install dependencies using uv:
uv sync
-
Set up environment variables:
Create a
.envfile in thebasic-multimodaldirectory with the following content:OPENAI_API_KEY=your_openai_api_key
Replace
your_openai_api_keywith your actual OpenAI API key. -
Run the example:
uv run python basic_multimodal/main.py
Defines the structure for storing nutrition information, including:
- Macronutrients (fats, proteins, carbohydrates)
- Micronutrients (vitamins and minerals)
- Serving information
- Product details
NutritionAnalysisInput: Handles input images and analysis instructionsNutritionAnalysisOutput: Structures the extracted nutrition information
A specialized agent configured with:
- GPT-4 Vision capabilities
- Custom system prompts for nutrition label analysis
- Structured data validation
The example includes test images in the test_images directory:
nutrition_label_1.png: Example nutrition label imagenutrition_label_2.jpg: Another example nutrition label image
Running the example will:
- Load the test images
- Process them through the nutrition analyzer
- Display structured nutritional information for each label
You can modify the example by:
- Adding your own nutrition label images to the
test_imagesdirectory - Adjusting the
NutritionLabelschema to capture additional information - Modifying the system prompt to focus on specific aspects of nutrition labels
Contributions are welcome! Please fork the repository and submit a pull request with your enhancements or bug fixes.
This project is licensed under the MIT License. See the LICENSE file for details.