Skip to main content
Moxn supports multimodal prompts with images, PDFs, and other files. This guide covers how to include and handle multimodal content.

Supported Content Types

TypeProvidersUse Cases
ImagesAllCharts, screenshots, diagrams
PDFsAnthropic, GoogleDocuments, reports
FilesVariesVarious document types

Images in Messages

In the Web App

Add images to messages:
  1. Click the image icon in the editor
  2. Upload an image or paste a URL
  3. Add alt text for accessibility

In Code

Images appear as content blocks:
# When you fetch a prompt with images
prompt = await client.get_prompt("...", branch_name="main")

for message in prompt.messages:
    for block_group in message.blocks:
        for block in block_group:
            if block.block_type == "image_from_source":
                print(f"Image: {block.url}")
                print(f"Alt: {block.alt}")

Provider Conversion

Images are automatically converted to provider format:
# Anthropic format (base64)
{
    "type": "image",
    "source": {
        "type": "base64",
        "media_type": "image/png",
        "data": "iVBORw0KGgo..."
    }
}

# OpenAI format (data URI)
{
    "type": "image_url",
    "image_url": {
        "url": "..."
    }
}

# Google format (Part with blob)
Part(inline_data=Blob(
    mime_type="image/png",
    data=bytes(...)
))

PDFs in Messages

Adding PDFs

In the web app:
  1. Click the file icon
  2. Upload a PDF
  3. It appears inline in the message

PDF Block Types

Like images, PDFs use PDFContentFromSource—a discriminated union. You can pass a dict or explicit type:
pdf = {
    "type": "url",
    "url": "https://example.com/document.pdf",
    "media_type": "application/pdf",
    "filename": "document.pdf"
}

PDF Types

type valueClassRequired Fields
"url"MediaDataPDFFromURLurl, media_type, filename
"base64"MediaDataPDFFromBase64base64, media_type, filename
"bytes"MediaDataPDFFromBytesbytes, media_type, filename
"local_file"MediaDataPDFFromLocalFilefilepath, media_type, filename

Signed URLs

For content stored in cloud storage (S3, GCS, etc.), Moxn uses signed URLs with automatic refresh.

How It Works

1. You upload content → stored in cloud storage
2. Prompt fetched → signed URLs generated (short expiry)
3. SDK registers content → tracks expiration
4. Before expiry → auto-refreshes URLs
5. Provider conversion → fresh URLs used

Automatic Refresh

The SDK handles refresh automatically:
async with MoxnClient() as client:
    # Signed URLs are registered when fetching
    prompt = await client.get_prompt("...", branch_name="main")

    # Later, when converting (URLs refreshed if needed)
    session = PromptSession.from_prompt_template(prompt, session_data)
    payload = session.to_anthropic_invocation()
    # ^ URLs are fresh at this point

Manual Registration

For prompts with signed content:
# Already handled by get_prompt(), but if you need manual control:
for message in prompt.messages:
    for block_group in message.blocks:
        for block in block_group:
            if isinstance(block, SignedURLContent):
                await client.content_client.register_content(block)

Image Variables

Use variables for dynamic images:

In the Web App

  1. Insert a variable with type “image”
  2. At runtime, provide the image object

In Code

Image variables use ImageContentFromSource, a discriminated union type. The SDK automatically dispatches to the correct image type based on the type field.
from moxn.base_models.blocks.image import ImageContentFromSource
from moxn.types.base import RenderableModel

class ImageAnalysisInput(RenderableModel):
    query: str
    screenshot: ImageContentFromSource

    def render(self, **kwargs) -> dict:
        return {
            "query": self.query,
            "screenshot": self.screenshot,  # Returns the image object directly
        }

Constructing Images

You can provide images in two ways:
Pass a dict with the right structure—no extra imports needed:
# From URL
image = {
    "type": "url",
    "url": "https://example.com/screenshot.png",
    "media_type": "image/png"
}

# From base64
image = {
    "type": "base64",
    "base64": "iVBORw0KGgo...",
    "media_type": "image/png"
}

session_data = ImageAnalysisInput(
    query="What's in this image?",
    screenshot=image
)
Both approaches are equivalent—the dict is validated and converted using Pydantic’s discriminated union dispatch.

Image Types

type valueClassRequired Fields
"url"MediaImageFromURLurl, media_type
"base64"MediaImageFromBase64base64, media_type
"bytes"MediaImageFromBytesbytes, media_type
"local_file"MediaImageFromLocalFilefilepath, media_type
Supported media_type values: "image/jpeg", "image/png", "image/gif", "image/webp"

Provider-Specific Handling

Moxn supports multiple provider APIs, each with different multimodal capabilities:

Anthropic

session.to_anthropic_invocation()
  • Images: PNG, JPEG, GIF, WebP
  • PDFs: Native support with citations

OpenAI

OpenAI has two distinct APIs with different invocation methods:
# Chat Completions API
session.to_openai_chat_invocation()

# Responses API (different format)
session.to_openai_responses_invocation()
  • Images: Via data URIs or URLs
  • PDFs: Limited support

Google

Google has two distinct APIs:
# Gemini Developer API
session.to_google_gemini_invocation()

# Vertex AI (different authentication and endpoints)
session.to_google_vertex_invocation()
  • Images: Various formats
  • PDFs: Native support
  • Note: Vertex AI requires GCS URIs (gs://) for remote files—public HTTP URLs are not supported

Error Handling

Handle multimodal-specific errors:
try:
    response = anthropic.messages.create(
        **session.to_anthropic_invocation()
    )
except anthropic.BadRequestError as e:
    if "image" in str(e).lower():
        # Image format or size issue
        print(f"Image error: {e}")
    raise

Complete Example

from moxn import MoxnClient
from moxn.types.content import Provider
from moxn.base_models.blocks.image import ImageContentFromSource
from moxn.types.base import RenderableModel
from anthropic import Anthropic

class ImageAnalysisInput(RenderableModel):
    """Input with an image for analysis."""
    screenshot: ImageContentFromSource
    question: str

    def render(self, **kwargs) -> dict:
        return {
            "screenshot": self.screenshot,
            "question": self.question,
        }

async def analyze_screenshot(image_url: str, question: str):
    async with MoxnClient() as client:
        # Construct image using dict literal (or use explicit MediaImageFromURL)
        image = {"type": "url", "url": image_url, "media_type": "image/png"}

        session = await client.create_prompt_session(
            prompt_id="image-analysis-prompt",
            session_data=ImageAnalysisInput(
                screenshot=image,
                question=question
            )
        )

        async with client.span(
            session,
            name="analyze_image",
            metadata={"has_image": True}
        ) as span:
            anthropic = Anthropic()
            response = anthropic.messages.create(
                **session.to_anthropic_invocation()
            )

            await client.log_telemetry_event_from_response(
                session, response, Provider.ANTHROPIC
            )

            return response.content[0].text

Next Steps