Prompt Sessions

A PromptSession is the runtime representation of a prompt combined with your input data. It handles variable substitution, provider conversion, and response parsing.

Creating Sessions

From prompt ID

The simplest way to create a session:

async with MoxnClient() as client:
    session = await client.create_prompt_session(
        prompt_id="your-prompt-id",
        branch_name="main",
        session_data=YourInputModel(
            query="How do I reset my password?",
            user_id="user_123"
        )
    )

This fetches the prompt and creates a session in one call.

From session data with metadata

If your codegen’d model has moxn_schema_metadata, you can create a session from just the data:

from generated_models import ProductHelpInput

session_data = ProductHelpInput(
    query="How do I reset my password?",
    user_id="user_123"
)

# Session data knows its prompt ID from metadata
session = await client.prompt_session_from_session_data(
    session_data=session_data,
    branch_name="main"
)

From a PromptTemplate directly

For more control, create sessions manually:

from moxn.models.prompt import PromptSession

prompt = await client.get_prompt("...", branch_name="main")

session = PromptSession.from_prompt_template(
    prompt=prompt,
    session_data=YourInputModel(...)
)

Session Data and Rendering

The render() flow

Session data goes through a transformation pipeline: The render() method returns a dictionary where values can be:

Strings: Simple text values (the base case)
ContentBlocks: Multimodal content like images and PDFs
Arrays: Sequences of ContentBlocks for structured context (e.g., citations)

How render() works

The render() method transforms your typed data into a dictionary that maps variable names to values. The base case returns string values:

class ProductHelpInput(RenderableModel):
    query: str
    user_id: str
    documents: list[Document]

    def render(self, **kwargs) -> dict[str, str]:
        return {
            "query": self.query,
            "user_id": self.user_id,
            "documents": json.dumps([d.model_dump() for d in self.documents])
        }

Each key in the returned dict corresponds to a variable name in your prompt messages. For multimodal content and structured arrays, render() can return ContentBlock types instead of strings—see Advanced Render Patterns below.

Customizing render()

You can customize how data is formatted for the LLM:

class ProductHelpInput(RenderableModel):
    documents: list[Document]

    def render(self, **kwargs) -> dict[str, str]:
        # Format as markdown
        formatted = "\n".join([
            f"## {doc.title}\n{doc.content}"
            for doc in self.documents
        ])
        return {
            "documents": formatted
        }

Or use XML formatting:

def render(self, **kwargs) -> dict[str, str]:
    docs_xml = "\n".join([
        f"<document id='{doc.id}'>\n{doc.content}\n</document>"
        for doc in self.documents
    ])
    return {
        "documents": f"<documents>\n{docs_xml}\n</documents>"
    }

Advanced Render Patterns

Beyond simple strings, render() can return ContentBlock types for multimodal content and arrays for structured context.

Multimodal Variables

Inject images and PDFs directly into your prompts:

from moxn.base_models.blocks.image import ImageContentFromSource
from moxn.base_models.blocks.file import PDFContentFromSource

class DocumentAnalysisInput(RenderableModel):
    query: str
    document_url: str
    screenshot_url: str

    def render(self, **kwargs) -> dict[str, Any]:
        return {
            "query": self.query,
            "document": PDFContentFromSource(
                url=self.document_url,
                name="source_document.pdf",
            ),
            "screenshot": ImageContentFromSource(
                url=self.screenshot_url,
            ),
        }

When the session converts to provider format, these ContentBlocks become the appropriate multimodal content (e.g., image and document blocks for Anthropic).

Arrays for Citations

Return arrays of ContentBlocks to enable structured citation support. This is particularly useful with Anthropic’s citations API, which allows the model to explicitly reference specific chunks of context in its response:

from moxn.base_models.blocks.text import TextContent

class RAGInput(RenderableModel):
    query: str
    chunks: list[DocumentChunk]

    def render(self, **kwargs) -> dict[str, Any]:
        return {
            "query": self.query,
            # Array of TextContent blocks for citations
            "context_chunks": [
                TextContent(text=chunk.text)
                for chunk in self.chunks
            ],
        }

When Anthropic’s citations feature is enabled, each TextContent in the array becomes a citable source. The model’s response can then reference specific chunks by index, making it easy to verify which sources informed each part of the response.

Arrays of ContentBlocks work with any provider, but citation-aware responses are currently specific to Anthropic’s citations API.

Converting to Provider Format

Sessions convert to provider-specific formats using the to_*_invocation() methods:

Anthropic

from anthropic import Anthropic

anthropic = Anthropic()
response = anthropic.messages.create(
    **session.to_anthropic_invocation()
)

The invocation includes:

system: System message (if present)
messages: User/assistant messages
model: From prompt’s completion_config
max_tokens: From completion_config
tools: If tools are configured
output_format: If structured output is configured

OpenAI

from openai import OpenAI

openai = OpenAI()
response = openai.chat.completions.create(
    **session.to_openai_chat_invocation()
)

Google

from google import genai

client = genai.Client()
response = client.models.generate_content(
    **session.to_google_gemini_invocation()
)

Generic method

Use to_invocation() which automatically selects the provider:

from moxn.types.content import Provider

# Uses completion_config.provider if set
payload = session.to_invocation()

# Or specify explicitly
payload = session.to_invocation(provider=Provider.ANTHROPIC)

Provider Defaulting

When your prompt has completion_config.provider set (configured in the Moxn web app), many methods use it automatically:

# These all use the stored provider from completion_config:
payload = session.to_invocation()          # Invocation payload
payload = session.to_payload()             # Messages-only payload
parsed = session.parse_response(response)  # Response parsing
event = session.create_llm_event_from_response(response)  # Telemetry

# Override when needed:
parsed = session.parse_response(response, provider=Provider.OPENAI_CHAT)

This simplifies code when you’re using the provider configured in the web app, while still allowing runtime overrides.

Overriding parameters

Override model parameters at runtime:

response = anthropic.messages.create(
    **session.to_anthropic_invocation(
        model="claude-sonnet-4-20250514",  # Override model
        max_tokens=4096,                    # Override max tokens
        temperature=0.7                     # Override temperature
    )
)

Adding Runtime Messages

Append messages to a session after creation:

Add user message

session.append_user_text(
    text="Here's additional context...",
    name="Additional Context"
)

Add assistant message

session.append_assistant_text(
    text="I understand. Let me help with that.",
    name="Assistant Acknowledgment"
)

Add from LLM response

After getting a response, add it to the session for multi-turn conversations:

# Parse the response (uses stored provider from completion_config)
parsed = session.parse_response(response)

# Add to session
session.append_assistant_response(
    parsed_response=parsed,
    name="Assistant Response"
)

# Now add a follow-up user message
session.append_user_text("Can you elaborate on that?")

# Send again
response2 = anthropic.messages.create(
    **session.to_anthropic_invocation()
)

Response Parsing

Parse provider responses into a normalized format:

# Get raw response from provider
response = anthropic.messages.create(...)

# Parse to normalized format (uses stored provider from completion_config)
parsed = session.parse_response(response)

# Or override the provider explicitly
# parsed = session.parse_response(response, provider=Provider.ANTHROPIC)

# Access normalized content
for candidate in parsed.candidates:
    for block in candidate.content:
        if block.block_type == "text":
            print(block.text)
        elif block.block_type == "tool_call":
            print(f"Tool: {block.tool_name}({block.input})")

# Access metadata
print(f"Tokens: {parsed.input_tokens} in, {parsed.output_tokens} out")
print(f"Model: {parsed.model}")
print(f"Stop reason: {parsed.stop_reason}")

Creating Telemetry Events

Create LLM events from responses for logging:

# Method 1: From raw response (uses stored provider from completion_config)
llm_event = session.create_llm_event_from_response(response)

# Or override the provider explicitly
# llm_event = session.create_llm_event_from_response(response, provider=Provider.ANTHROPIC)

# Method 2: From parsed response (with more control)
parsed = session.parse_response(response)
llm_event = session.create_llm_event_from_parsed_response(
    parsed_response=parsed,
    request_config=request_config,  # Optional
    schema_definition=schema,       # Optional
    attributes={"custom": "data"},  # Optional
    validation_errors=errors        # Optional
)

# Log it
async with client.span(session) as span:
    await client.log_telemetry_event(llm_event)

Session Properties

Access session information:

session.id           # UUID - unique session identifier
session.prompt_id    # UUID - source prompt ID
session.prompt       # PromptTemplate - the underlying prompt
session.messages     # list[Message] - current messages (with appended)
session.session_data # RenderableModel | None - your input data

Complete Example

Here’s a full multi-turn conversation example:

from moxn import MoxnClient
from moxn.types.content import Provider
from anthropic import Anthropic
from generated_models import ChatInput

async def chat_conversation():
    async with MoxnClient() as client:
        # Create initial session
        session = await client.create_prompt_session(
            prompt_id="chat-prompt-id",
            branch_name="main",
            session_data=ChatInput(
                user_name="Alice",
                system_context="You are a helpful assistant."
            )
        )

        anthropic = Anthropic()

        # First turn
        session.append_user_text("What's the weather like today?")

        async with client.span(session, name="turn_1") as span:
            response1 = anthropic.messages.create(
                **session.to_anthropic_invocation()
            )
            await client.log_telemetry_event_from_response(
                session, response1, Provider.ANTHROPIC
            )

        # Add response to session
        parsed1 = session.parse_response(response1)
        session.append_assistant_response(parsed1)

        # Second turn
        session.append_user_text("What about tomorrow?")

        async with client.span(session, name="turn_2") as span:
            response2 = anthropic.messages.create(
                **session.to_anthropic_invocation()
            )
            await client.log_telemetry_event_from_response(
                session, response2, Provider.ANTHROPIC
            )

        return response2.content[0].text

Next Steps

Provider Integration

Deep dive into provider-specific handling

Code Generation

Generate type-safe session data models

Spans & Tracing

Set up observability for sessions

Variables & Schemas

Understand how variables work

Video Walkthrough

Getting Started

Core Workflow

Concepts

Telemetry

Web App

Advanced

Creating Sessions

From prompt ID

From session data with metadata

From a PromptTemplate directly

Session Data and Rendering

The render() flow

How render() works

Customizing render()

Advanced Render Patterns

Multimodal Variables

Arrays for Citations

Converting to Provider Format

Anthropic

OpenAI

Google

Generic method

Provider Defaulting

Overriding parameters

Adding Runtime Messages

Add user message

Add assistant message

Add from LLM response

Response Parsing

Creating Telemetry Events

Session Properties

Complete Example

Next Steps

Provider Integration

Code Generation

Spans & Tracing

Variables & Schemas

Video Walkthrough

Getting Started

Core Workflow

Concepts

Telemetry

Web App

Advanced

​Creating Sessions

​From prompt ID

​From session data with metadata

​From a PromptTemplate directly

​Session Data and Rendering

​The render() flow

​How render() works

​Customizing render()

​Advanced Render Patterns

​Multimodal Variables

​Arrays for Citations

​Converting to Provider Format

​Anthropic

​OpenAI

​Google

​Generic method

​Provider Defaulting

​Overriding parameters

​Adding Runtime Messages

​Add user message

​Add assistant message

​Add from LLM response

​Response Parsing

​Creating Telemetry Events

​Session Properties

​Complete Example

​Next Steps

Provider Integration

Code Generation

Spans & Tracing

Variables & Schemas

Creating Sessions

From prompt ID

From session data with metadata

From a PromptTemplate directly

Session Data and Rendering

The render() flow

How render() works

Customizing render()

Advanced Render Patterns

Multimodal Variables

Arrays for Citations

Converting to Provider Format

Anthropic

OpenAI

Google

Generic method

Provider Defaulting

Overriding parameters

Adding Runtime Messages

Add user message

Add assistant message

Add from LLM response

Response Parsing

Creating Telemetry Events

Session Properties

Complete Example

Next Steps