
Recipe Scraper API
January 23, 2025
Recipe Scraper API
A FastAPI service I built that scrapes recipes from any webpage (including Instagram posts), uses OpenAI's GPT-4 to extract structured data, and returns it in a standardized JSON format. The API handles various recipe formats and normalizes them into a consistent schema.
What It Does
- Scrapes recipes from any webpage, including Instagram posts
- Uses OpenAI's GPT-4 to extract structured recipe data
- Returns a well-defined JSON schema with:
- Ingredients with quantities and units
- Step-by-step instructions
- Recipe metadata (prep time, cook time, difficulty, etc.)
- Categories and descriptions
How It Works
The API uses BeautifulSoup for web scraping and OpenAI's GPT-4 for intelligent data extraction. It handles both traditional websites and Instagram posts, normalizing the data into a consistent format.
# Example of the Recipe schema
class Recipe(BaseModel):
title: str
description: str | None = None
photo: str | None = None
category: Category
ingredients: list[Ingredient]
instructions: list[Instruction]
prep_time: int | None = None
cook_time: int | None = None
difficulty: int
servings: int
yield_: int | None = None
original_link: str | None = None
video_link: str | None = None
# Example of ingredient structure
class Ingredient(BaseModel):
ingredient_name: str
quantity: float
measurement_unit: MeasurementUnit
notes: str | None = None
Technical Bits
- Built with FastAPI for high performance
- Uses BeautifulSoup for web scraping
- Integrates with OpenAI's GPT-4 API
- Implements Pydantic models for data validation
- Handles Instagram-specific scraping
- Supports various measurement units and categories
Features
- Universal Scraping: Works with any recipe website
- Instagram Support: Special handling for Instagram recipe posts
- Structured Output: Consistent JSON format for all recipes
- Intelligent Extraction: Uses GPT-4 to understand and structure recipe data
- Data Validation: Ensures all required fields are present and properly formatted
Why I Built It
I wanted to create a unified way to access recipe data from various sources. This API helps standardize recipe information, making it easier to work with recipe data across different platforms and formats.
Future Ideas
- Add support for more social media platforms
- Implement recipe image extraction
- Add nutritional information extraction
- Support multiple languages
- Add recipe scaling functionality
- Implement caching for frequently accessed recipes