A PromptSession is the runtime representation of a prompt combined with your input data. It handles variable substitution, provider conversion, and response parsing.
Creating Sessions
From prompt ID
The simplest way to create a session:
async with MoxnClient() as client:
session = await client.create_prompt_session(
prompt_id="your-prompt-id",
branch_name="main",
session_data=YourInputModel(
query="How do I reset my password?",
user_id="user_123"
)
)
This fetches the prompt and creates a session in one call.
If your codegen’d model has moxn_schema_metadata, you can create a session from just the data:
from generated_models import ProductHelpInput
session_data = ProductHelpInput(
query="How do I reset my password?",
user_id="user_123"
)
# Session data knows its prompt ID from metadata
session = await client.prompt_session_from_session_data(
session_data=session_data,
branch_name="main"
)
From a PromptTemplate directly
For more control, create sessions manually:
from moxn.models.prompt import PromptSession
prompt = await client.get_prompt("...", branch_name="main")
session = PromptSession.from_prompt_template(
prompt=prompt,
session_data=YourInputModel(...)
)
Session Data and Rendering
The render() flow
Session data goes through a transformation pipeline:
The render() method returns a dictionary where values can be:
- Strings: Simple text values (the base case)
- ContentBlocks: Multimodal content like images and PDFs
- Arrays: Sequences of ContentBlocks for structured context (e.g., citations)
How render() works
The render() method transforms your typed data into a dictionary that maps variable names to values. The base case returns string values:
class ProductHelpInput(RenderableModel):
query: str
user_id: str
documents: list[Document]
def render(self, **kwargs) -> dict[str, str]:
return {
"query": self.query,
"user_id": self.user_id,
"documents": json.dumps([d.model_dump() for d in self.documents])
}
Each key in the returned dict corresponds to a variable name in your prompt messages.
For multimodal content and structured arrays, render() can return ContentBlock types instead of strings—see Advanced Render Patterns below.
Customizing render()
You can customize how data is formatted for the LLM:
class ProductHelpInput(RenderableModel):
documents: list[Document]
def render(self, **kwargs) -> dict[str, str]:
# Format as markdown
formatted = "\n".join([
f"## {doc.title}\n{doc.content}"
for doc in self.documents
])
return {
"documents": formatted
}
Or use XML formatting:
def render(self, **kwargs) -> dict[str, str]:
docs_xml = "\n".join([
f"<document id='{doc.id}'>\n{doc.content}\n</document>"
for doc in self.documents
])
return {
"documents": f"<documents>\n{docs_xml}\n</documents>"
}
Advanced Render Patterns
Beyond simple strings, render() can return ContentBlock types for multimodal content and arrays for structured context.
Multimodal Variables
Inject images and PDFs directly into your prompts:
from moxn.base_models.blocks.image import ImageContentFromSource
from moxn.base_models.blocks.file import PDFContentFromSource
class DocumentAnalysisInput(RenderableModel):
query: str
document_url: str
screenshot_url: str
def render(self, **kwargs) -> dict[str, Any]:
return {
"query": self.query,
"document": PDFContentFromSource(
url=self.document_url,
name="source_document.pdf",
),
"screenshot": ImageContentFromSource(
url=self.screenshot_url,
),
}
When the session converts to provider format, these ContentBlocks become the appropriate multimodal content (e.g., image and document blocks for Anthropic).
Arrays for Citations
Return arrays of ContentBlocks to enable structured citation support. This is particularly useful with Anthropic’s citations API, which allows the model to explicitly reference specific chunks of context in its response:
from moxn.base_models.blocks.text import TextContent
class RAGInput(RenderableModel):
query: str
chunks: list[DocumentChunk]
def render(self, **kwargs) -> dict[str, Any]:
return {
"query": self.query,
# Array of TextContent blocks for citations
"context_chunks": [
TextContent(text=chunk.text)
for chunk in self.chunks
],
}
When Anthropic’s citations feature is enabled, each TextContent in the array becomes a citable source. The model’s response can then reference specific chunks by index, making it easy to verify which sources informed each part of the response.
Arrays of ContentBlocks work with any provider, but citation-aware responses are currently specific to Anthropic’s citations API.
Sessions convert to provider-specific formats using the to_*_invocation() methods:
Anthropic
from anthropic import Anthropic
anthropic = Anthropic()
response = anthropic.messages.create(
**session.to_anthropic_invocation()
)
The invocation includes:
system: System message (if present)
messages: User/assistant messages
model: From prompt’s completion_config
max_tokens: From completion_config
tools: If tools are configured
output_format: If structured output is configured
OpenAI
from openai import OpenAI
openai = OpenAI()
response = openai.chat.completions.create(
**session.to_openai_chat_invocation()
)
Google
from google import genai
client = genai.Client()
response = client.models.generate_content(
**session.to_google_gemini_invocation()
)
Generic method
Use to_invocation() which automatically selects the provider:
from moxn.types.content import Provider
# Uses completion_config.provider if set
payload = session.to_invocation()
# Or specify explicitly
payload = session.to_invocation(provider=Provider.ANTHROPIC)
Provider Defaulting
When your prompt has completion_config.provider set (configured in the Moxn web app), many methods use it automatically:
# These all use the stored provider from completion_config:
payload = session.to_invocation() # Invocation payload
payload = session.to_payload() # Messages-only payload
parsed = session.parse_response(response) # Response parsing
event = session.create_llm_event_from_response(response) # Telemetry
# Override when needed:
parsed = session.parse_response(response, provider=Provider.OPENAI_CHAT)
This simplifies code when you’re using the provider configured in the web app, while still allowing runtime overrides.
Overriding parameters
Override model parameters at runtime:
response = anthropic.messages.create(
**session.to_anthropic_invocation(
model="claude-sonnet-4-20250514", # Override model
max_tokens=4096, # Override max tokens
temperature=0.7 # Override temperature
)
)
Adding Runtime Messages
Append messages to a session after creation:
Add user message
session.append_user_text(
text="Here's additional context...",
name="Additional Context"
)
Add assistant message
session.append_assistant_text(
text="I understand. Let me help with that.",
name="Assistant Acknowledgment"
)
Add from LLM response
After getting a response, add it to the session for multi-turn conversations:
# Parse the response (uses stored provider from completion_config)
parsed = session.parse_response(response)
# Add to session
session.append_assistant_response(
parsed_response=parsed,
name="Assistant Response"
)
# Now add a follow-up user message
session.append_user_text("Can you elaborate on that?")
# Send again
response2 = anthropic.messages.create(
**session.to_anthropic_invocation()
)
Response Parsing
Parse provider responses into a normalized format:
# Get raw response from provider
response = anthropic.messages.create(...)
# Parse to normalized format (uses stored provider from completion_config)
parsed = session.parse_response(response)
# Or override the provider explicitly
# parsed = session.parse_response(response, provider=Provider.ANTHROPIC)
# Access normalized content
for candidate in parsed.candidates:
for block in candidate.content:
if block.block_type == "text":
print(block.text)
elif block.block_type == "tool_call":
print(f"Tool: {block.tool_name}({block.input})")
# Access metadata
print(f"Tokens: {parsed.input_tokens} in, {parsed.output_tokens} out")
print(f"Model: {parsed.model}")
print(f"Stop reason: {parsed.stop_reason}")
Creating Telemetry Events
Create LLM events from responses for logging:
# Method 1: From raw response (uses stored provider from completion_config)
llm_event = session.create_llm_event_from_response(response)
# Or override the provider explicitly
# llm_event = session.create_llm_event_from_response(response, provider=Provider.ANTHROPIC)
# Method 2: From parsed response (with more control)
parsed = session.parse_response(response)
llm_event = session.create_llm_event_from_parsed_response(
parsed_response=parsed,
request_config=request_config, # Optional
schema_definition=schema, # Optional
attributes={"custom": "data"}, # Optional
validation_errors=errors # Optional
)
# Log it
async with client.span(session) as span:
await client.log_telemetry_event(llm_event)
Session Properties
Access session information:
session.id # UUID - unique session identifier
session.prompt_id # UUID - source prompt ID
session.prompt # PromptTemplate - the underlying prompt
session.messages # list[Message] - current messages (with appended)
session.session_data # RenderableModel | None - your input data
Complete Example
Here’s a full multi-turn conversation example:
from moxn import MoxnClient
from moxn.types.content import Provider
from anthropic import Anthropic
from generated_models import ChatInput
async def chat_conversation():
async with MoxnClient() as client:
# Create initial session
session = await client.create_prompt_session(
prompt_id="chat-prompt-id",
branch_name="main",
session_data=ChatInput(
user_name="Alice",
system_context="You are a helpful assistant."
)
)
anthropic = Anthropic()
# First turn
session.append_user_text("What's the weather like today?")
async with client.span(session, name="turn_1") as span:
response1 = anthropic.messages.create(
**session.to_anthropic_invocation()
)
await client.log_telemetry_event_from_response(
session, response1, Provider.ANTHROPIC
)
# Add response to session
parsed1 = session.parse_response(response1)
session.append_assistant_response(parsed1)
# Second turn
session.append_user_text("What about tomorrow?")
async with client.span(session, name="turn_2") as span:
response2 = anthropic.messages.create(
**session.to_anthropic_invocation()
)
await client.log_telemetry_event_from_response(
session, response2, Provider.ANTHROPIC
)
return response2.content[0].text
Next Steps