Multimodal Content

Moxn supports multimodal prompts with images, PDFs, and other files. This guide covers how to include and handle multimodal content.

Supported Content Types

Type	Providers	Use Cases
Images	All	Charts, screenshots, diagrams
PDFs	Anthropic, Google	Documents, reports
Files	Varies	Various document types

Images in Messages

In the Web App

Add images to messages:

Click the image icon in the editor
Upload an image or paste a URL
Add alt text for accessibility

In Code

Images appear as content blocks:

# When you fetch a prompt with images
prompt = await client.get_prompt("...", branch_name="main")

for message in prompt.messages:
    for block_group in message.blocks:
        for block in block_group:
            if block.block_type == "image_from_source":
                print(f"Image: {block.url}")
                print(f"Alt: {block.alt}")

Provider Conversion

Images are automatically converted to provider format:

# Anthropic format (base64)
{
    "type": "image",
    "source": {
        "type": "base64",
        "media_type": "image/png",
        "data": "iVBORw0KGgo..."
    }
}

# OpenAI format (data URI)
{
    "type": "image_url",
    "image_url": {
        "url": "data:image/png;base64,iVBORw0KGgo..."
    }
}

# Google format (Part with blob)
Part(inline_data=Blob(
    mime_type="image/png",
    data=bytes(...)
))

PDFs in Messages

Adding PDFs

In the web app:

Click the file icon
Upload a PDF
It appears inline in the message

PDF Block Types

Like images, PDFs use PDFContentFromSource—a discriminated union. You can pass a dict or explicit type:

Dict Literal
Explicit Types

pdf = {
    "type": "url",
    "url": "https://example.com/document.pdf",
    "media_type": "application/pdf",
    "filename": "document.pdf"
}

from moxn.base_models.blocks.file import MediaDataPDFFromURL

pdf = MediaDataPDFFromURL(
    url="https://example.com/document.pdf",
    media_type="application/pdf",
    filename="document.pdf"
)

PDF Types

`type` value	Class	Required Fields
`"url"`	`MediaDataPDFFromURL`	`url`, `media_type`, `filename`
`"base64"`	`MediaDataPDFFromBase64`	`base64`, `media_type`, `filename`
`"bytes"`	`MediaDataPDFFromBytes`	`bytes`, `media_type`, `filename`
`"local_file"`	`MediaDataPDFFromLocalFile`	`filepath`, `media_type`, `filename`

Signed URLs

For content stored in cloud storage (S3, GCS, etc.), Moxn uses signed URLs with automatic refresh.

How It Works

You upload content → stored in cloud storage
Prompt fetched → signed URLs generated (short expiry)
SDK registers content → tracks expiration
Before expiry → auto-refreshes URLs
Provider conversion → fresh URLs used

Automatic Refresh

The SDK handles refresh automatically:

async with MoxnClient() as client:
    # Signed URLs are registered when fetching
    prompt = await client.get_prompt("...", branch_name="main")

    # Later, when converting (URLs refreshed if needed)
    session = PromptSession.from_prompt_template(prompt, session_data)
    payload = session.to_anthropic_invocation()
    # ^ URLs are fresh at this point

Manual Registration

For prompts with signed content:

# Already handled by get_prompt(), but if you need manual control:
for message in prompt.messages:
    for block_group in message.blocks:
        for block in block_group:
            if isinstance(block, SignedURLContent):
                await client.content_client.register_content(block)

Image Variables

Use variables for dynamic images:

In the Web App

Insert a variable with type “image”
At runtime, provide the image object

In Code

Image variables use ImageContentFromSource, a discriminated union type. The SDK automatically dispatches to the correct image type based on the type field.

from moxn.base_models.blocks.image import ImageContentFromSource
from moxn.types.base import RenderableModel

class ImageAnalysisInput(RenderableModel):
    query: str
    screenshot: ImageContentFromSource

    def render(self, **kwargs) -> dict:
        return {
            "query": self.query,
            "screenshot": self.screenshot,  # Returns the image object directly
        }

Constructing Images

You can provide images in two ways:

Dict Literal
Explicit Types

Pass a dict with the right structure—no extra imports needed:

# From URL
image = {
    "type": "url",
    "url": "https://example.com/screenshot.png",
    "media_type": "image/png"
}

# From base64
image = {
    "type": "base64",
    "base64": "iVBORw0KGgo...",
    "media_type": "image/png"
}

session_data = ImageAnalysisInput(
    query="What's in this image?",
    screenshot=image
)

Import and use the specific type for better IDE support:

from moxn.base_models.blocks.image import MediaImageFromURL

image = MediaImageFromURL(
    url="https://example.com/screenshot.png",
    media_type="image/png"
)

session_data = ImageAnalysisInput(
    query="What's in this image?",
    screenshot=image
)

Both approaches are equivalent—the dict is validated and converted using Pydantic’s discriminated union dispatch.

Image Types

`type` value	Class	Required Fields
`"url"`	`MediaImageFromURL`	`url`, `media_type`
`"base64"`	`MediaImageFromBase64`	`base64`, `media_type`
`"bytes"`	`MediaImageFromBytes`	`bytes`, `media_type`
`"local_file"`	`MediaImageFromLocalFile`	`filepath`, `media_type`

Supported media_type values: "image/jpeg", "image/png", "image/gif", "image/webp"

Provider-Specific Handling

Moxn supports multiple provider APIs, each with different multimodal capabilities:

Anthropic

session.to_anthropic_invocation()

Images: PNG, JPEG, GIF, WebP
PDFs: Native support with citations

OpenAI

OpenAI has two distinct APIs with different invocation methods:

# Chat Completions API
session.to_openai_chat_invocation()

# Responses API (different format)
session.to_openai_responses_invocation()

Images: Via data URIs or URLs
PDFs: Limited support

Google

Google has two distinct APIs:

# Gemini Developer API
session.to_google_gemini_invocation()

# Vertex AI (different authentication and endpoints)
session.to_google_vertex_invocation()

Images: Various formats
PDFs: Native support
Note: Vertex AI requires GCS URIs (gs://) for remote files—public HTTP URLs are not supported

Error Handling

Handle multimodal-specific errors:

try:
    response = anthropic.messages.create(
        **session.to_anthropic_invocation()
    )
except anthropic.BadRequestError as e:
    if "image" in str(e).lower():
        # Image format or size issue
        print(f"Image error: {e}")
    raise

Complete Example

from moxn import MoxnClient
from moxn.types.content import Provider
from moxn.base_models.blocks.image import ImageContentFromSource
from moxn.types.base import RenderableModel
from anthropic import Anthropic

class ImageAnalysisInput(RenderableModel):
    """Input with an image for analysis."""
    screenshot: ImageContentFromSource
    question: str

    def render(self, **kwargs) -> dict:
        return {
            "screenshot": self.screenshot,
            "question": self.question,
        }

async def analyze_screenshot(image_url: str, question: str):
    async with MoxnClient() as client:
        # Construct image using dict literal (or use explicit MediaImageFromURL)
        image = {"type": "url", "url": image_url, "media_type": "image/png"}

        session = await client.create_prompt_session(
            prompt_id="image-analysis-prompt",
            session_data=ImageAnalysisInput(
                screenshot=image,
                question=question
            )
        )

        async with client.span(
            session,
            name="analyze_image",
            metadata={"has_image": True}
        ) as span:
            anthropic = Anthropic()
            response = anthropic.messages.create(
                **session.to_anthropic_invocation()
            )

            await client.log_telemetry_event_from_response(
                session, response, Provider.ANTHROPIC
            )

            return response.content[0].text

Next Steps

Structured Outputs

Parse structured responses

Provider Integration

Provider-specific handling

Document Analysis

Build a document analysis pipeline

Logging Events

Log multimodal interactions

Video Walkthrough

Getting Started

Core Workflow

Concepts

Telemetry

Web App

Advanced

Multimodal Content

Supported Content Types

Images in Messages

In the Web App

In Code

Provider Conversion

PDFs in Messages

Adding PDFs

PDF Block Types

PDF Types

Signed URLs

How It Works

Automatic Refresh

Manual Registration

Image Variables

In the Web App

In Code

Constructing Images

Image Types

Provider-Specific Handling

Anthropic

OpenAI

Google

Error Handling

Complete Example

Next Steps

Structured Outputs

Provider Integration

Document Analysis

Logging Events

Video Walkthrough

Getting Started

Core Workflow

Concepts

Telemetry

Web App

Advanced

​Supported Content Types

​Images in Messages

​In the Web App

​In Code

​Provider Conversion

​PDFs in Messages

​Adding PDFs

​PDF Block Types

​PDF Types

​Signed URLs

​How It Works

​Automatic Refresh

​Manual Registration

​Image Variables

​In the Web App

​In Code

​Constructing Images

​Image Types

​Provider-Specific Handling

​Anthropic

​OpenAI

​Google

​Error Handling

​Complete Example

​Next Steps

Structured Outputs

Provider Integration

Document Analysis

Logging Events

Supported Content Types

Images in Messages

In the Web App

In Code

Provider Conversion

PDFs in Messages

Adding PDFs

PDF Block Types

PDF Types

Signed URLs

How It Works

Automatic Refresh

Manual Registration

Image Variables

In the Web App

In Code

Constructing Images

Image Types

Provider-Specific Handling

Anthropic

OpenAI

Google

Error Handling

Complete Example

Next Steps