Skip to main content

Template and Document Generation Cookbook

This cookbook walks you through the complete workflow of automatically generating a Machine Readable Template (MRT) from example documents and using it to generate new documents. You’ll learn by uploading example CSR (Clinical Study Report) documents, automatically generating a reusable template from them, and then generating new documents using that template.

Overview

The document generation workflow follows these steps:
  1. Upload Template and Example Documents - Provide a template document and sample documents
  2. Generate Template from Examples - System automatically creates an MRT structure from your examples
  3. Upload Source Documents - Prepare source materials for new document generation
  4. Create Document Set - Organize documents into a logical group
  5. Generate Document - Request document generation using the auto-generated template
  6. Monitor Progress - Poll status until generation completes
  7. Retrieve Results - Download the generated document and metadata

Prerequisites

Before starting, ensure you have:
  • API Token - Valid Bearer token for authentication (see Authentication)
  • API Endpoint - Access to https://api.artosai.com
  • Tools - curl, Python 3.6+, or equivalent HTTP client
  • Example Documents - Sample regulatory documents (PDF, DOCX) that show your desired template structure
  • Source Documents - Regulatory documents to process (PDF, DOCX, Excel)

Section 1: Preparing and Uploading Documents

The template generation system learns from example documents. Prepare documents that represent your desired template structure.

Document Types

Template Document (optional):
  • A single document showing the desired output structure and format
  • Should include section headers, formatting, and style guidelines
  • Helps the system understand your preferred organization
Example Documents:
  • 1-3 existing documents of the type you want to generate
  • Should follow the same structure as your desired output
  • Used to identify sections, content patterns, and extraction rules
  • More diverse examples = better template
Source Documents (for generation):
  • Raw materials to be processed
  • Will be analyzed and content extracted to fill the generated template
  • Can be different from examples (extracted content will follow template structure)

Upload Documents

Use the Files API to upload template and example documents: curl Example - Upload Template Document:
curl -X POST "https://api.artosai.com/api/v1/files/upload" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -F "file_name=csr_template.docx" \
  -F "file_content=@/path/to/csr_template.docx" \
  -F "container=templates"
curl Example - Upload Example Documents:
# Upload first example
curl -X POST "https://api.artosai.com/api/v1/files/upload" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -F "file_name=example_csr_2024.docx" \
  -F "file_content=@/path/to/example_csr_2024.docx" \
  -F "container=documents"

# Upload second example
curl -X POST "https://api.artosai.com/api/v1/files/upload" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -F "file_name=example_csr_2023.docx" \
  -F "file_content=@/path/to/example_csr_2023.docx" \
  -F "container=documents"
Python Example:
import requests

def upload_file(token, filename, filepath, container):
    """Upload a file for template generation."""
    url = "https://api.artosai.com/api/v1/files/upload"
    headers = {"Authorization": f"Bearer {token}"}

    files = {
        "file_name": (None, filename),
        "container": (None, container),
        "file_content": open(filepath, "rb")
    }

    response = requests.post(url, headers=headers, files=files)

    if response.status_code == 200:
        print(f"✓ {filename} uploaded to {container}")
        return True
    else:
        print(f"✗ Upload failed: {response.json().get('detail')}")
        return False

# Upload documents
token = "YOUR_TOKEN"
upload_file(token, "csr_template.docx", "csr_template.docx", "templates")
upload_file(token, "example_csr_2024.docx", "example_csr_2024.docx", "documents")
upload_file(token, "example_csr_2023.docx", "example_csr_2023.docx", "documents")

Section 2: Generating a Template from Examples

The POST /api/v1/templates/generate endpoint automatically analyzes your example documents and generates a structured MRT template with sections and extraction rules.

How Template Generation Works

The system:
  1. Analyzes example documents for structure and content patterns
  2. Identifies sections, headers, and content types
  3. Generates extraction rules based on patterns found
  4. Creates a reusable MRT with hierarchical sections
  5. Returns a template ID for document generation

Create a Generic MRT

Submit template generation request with your documents: curl Example:
curl -X POST "https://api.artosai.com/api/v1/templates/generate" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "template_filename": "csr_template.docx",
    "example_filenames": [
      "example_csr_2024.docx",
      "example_csr_2023.docx"
    ],
    "name": "CSR Template 2024",
    "description": "Automatically generated template for Clinical Study Reports",
    "document_type": "CSR",
    "tags": ["csr", "regulatory", "auto-generated"]
  }'
Python Example:
import requests
import json

def generate_template(token, template_file, example_files, name, doc_type):
    """Generate an MRT template from template and example documents."""
    url = "https://api.artosai.com/api/v1/templates/generate"
    headers = {
        "Authorization": f"Bearer {token}",
        "Content-Type": "application/json"
    }

    payload = {
        "template_filename": template_file,
        "example_filenames": example_files,
        "name": name,
        "description": f"Auto-generated template for {doc_type} documents",
        "document_type": doc_type,
        "tags": [doc_type.lower(), "auto-generated"]
    }

    response = requests.post(url, headers=headers, json=payload)

    if response.status_code == 200 or response.status_code == 202:
        result = response.json()
        template_id = result.get('template_id') or result.get('id')
        print(f"✓ Template generated: {template_id}")
        print(f"  Task ID: {result.get('task_id', 'N/A')}")
        return template_id, result.get('task_id')
    else:
        print(f"✗ Error: {response.json().get('detail', 'Unknown error')}")
        return None, None

# Generate template
token = "YOUR_TOKEN"
template_id, task_id = generate_template(
    token,
    template_file="csr_template.docx",
    example_files=["example_csr_2024.docx", "example_csr_2023.docx"],
    name="CSR Template 2024",
    doc_type="CSR"
)
Response (200 OK or 202 Accepted):
{
  "template_id": "template-uuid-abc123",
  "name": "CSR Template 2024",
  "document_type": "CSR",
  "status": "Complete",
  "section_count": 6,
  "created_at": "2024-01-25T12:00:00Z"
}

Request Parameters

ParameterTypeRequiredDescription
template_filenamestringYesFilename of template document (will be prefixed with org S3 path)
example_filenamesarrayNoFilenames of example documents for analysis
namestringYesName for the generated MRT
descriptionstringNoDescription of the template
document_typestringNoDocument type (default: “CSR”)
tagsarrayNoTags for organization and search
default_connector_data_idsarrayNoDefault data source connectors
cache_versionstringNoCache version for reprocessing

Generated Template Structure

The system automatically creates sections with:
  • Hierarchical organization - Top-level and nested sections
  • Extraction rules - Auto-identified rules for content extraction
  • Content patterns - Recognized from examples
  • Reusable structure - Can be applied to similar documents
Save the template_id for document generation steps.

Section 3: Uploading Source Documents

Before generating documents, upload the source materials to be processed.

Supported File Types

TypeExtensionNotes
PDF.pdfRecommended for documents
Word.docxAuto-converted to PDF (except in templates container)
Excel.xlsxFor data/tables
CSV.csvFor structured data
RTF.rtfRich text format

Upload a File

Use the Files API to upload documents: curl Example:
curl -X POST "https://api.artosai.com/api/v1/files/upload" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -F "file_name=protocol.pdf" \
  -F "file_content=@/path/to/protocol.pdf" \
  -F "container=documents"
Python Example:
import requests

url = "https://api.artosai.com/api/v1/files/upload"
headers = {"Authorization": "Bearer YOUR_TOKEN"}

files = {
    "file_name": (None, "protocol.pdf"),
    "container": (None, "documents"),
    "file_content": open("/path/to/protocol.pdf", "rb")
}

response = requests.post(url, headers=headers, files=files)
result = response.json()

if response.status_code == 200:
    print("✓ File uploaded successfully")
else:
    print(f"✗ Error: {result.get('detail', 'Upload failed')}")
Response (200 OK):
{
  "message": "File uploaded successfully"
}

Upload Multiple Files

Repeat the upload process for each source document:
# Upload protocol
curl -X POST "https://api.artosai.com/api/v1/files/upload" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -F "file_name=protocol.pdf" \
  -F "[email protected]" \
  -F "container=documents"

# Upload safety report
curl -X POST "https://api.artosai.com/api/v1/files/upload" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -F "file_name=safety_report.docx" \
  -F "file_content=@safety_report.docx" \
  -F "container=documents"

# Upload efficacy data
curl -X POST "https://api.artosai.com/api/v1/files/upload" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -F "file_name=efficacy_data.xlsx" \
  -F "file_content=@efficacy_data.xlsx" \
  -F "container=documents"

File Container Types

ContainerPurposeAuto-Convert
documentsProcessed documentsYes (DOCX→PDF)
templatesTemplate filesNo
inputSource documentsYes
outputGenerated output filesNo

Section 4: Generating a Document

Now that you have a generated template and source documents, request document generation.

Create a Document Set

First, organize your documents into a document set: curl Example:
curl -X POST "https://api.artosai.com/api/v1/document-sets/" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "document_set_name": "Q1 2024 CSR Submission"
  }'
Python Example:
import requests

url = "https://api.artosai.com/api/v1/document-sets/"
headers = {
    "Authorization": "Bearer YOUR_TOKEN",
    "Content-Type": "application/json"
}

payload = {"document_set_name": "Q1 2024 CSR Submission"}

response = requests.post(url, headers=headers, json=payload)
result = response.json()

if response.status_code == 201:
    document_set_id = result['document_set_id']
    print(f"✓ Document set created: {document_set_id}")
else:
    print(f"✗ Error: {result.get('detail', 'Failed to create set')}")
Response (201 Created):
{
  "document_set_id": "set-uuid-123",
  "document_set_name": "Q1 2024 CSR Submission",
  "organization_id": "org-uuid",
  "version": 1
}

Request Document Generation

Submit a generation request using your generated template: curl Example:
curl -X POST "https://api.artosai.com/api/v1/documents/generate" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "document_type": "CSR",
    "file_paths": [
      "s3://artos-bucket/org-id/documents/protocol.pdf",
      "s3://artos-bucket/org-id/documents/safety_report.pdf",
      "s3://artos-bucket/org-id/documents/efficacy_data.xlsx"
    ],
    "document_set_key": "q1-2024-csr",
    "document_set_name": "Q1 2024 CSR Submission",
    "template_id": "template-uuid-abc123",
    "output_name": "CSR_Final_Q1_2024.docx"
  }'
Python Example:
import requests

url = "https://api.artosai.com/api/v1/documents/generate"
headers = {
    "Authorization": "Bearer YOUR_TOKEN",
    "Content-Type": "application/json"
}

payload = {
    "document_type": "CSR",
    "file_paths": [
        "s3://artos-bucket/org-id/documents/protocol.pdf",
        "s3://artos-bucket/org-id/documents/safety_report.pdf",
        "s3://artos-bucket/org-id/documents/efficacy_data.xlsx"
    ],
    "document_set_key": "q1-2024-csr",
    "document_set_name": "Q1 2024 CSR Submission",
    "template_id": "template-uuid-abc123",
    "output_name": "CSR_Final_Q1_2024.docx"
}

response = requests.post(url, headers=headers, json=payload)
result = response.json()

if response.status_code == 202:
    task_id = result['task_id']
    print(f"✓ Generation request accepted")
    print(f"  Task ID: {task_id}")
else:
    print(f"✗ Error: {result.get('detail', 'Generation failed')}")
Response (202 Accepted):
{
  "message": "Document generation started",
  "task_id": "celery-task-uuid-456"
}

Generation Request Parameters

ParameterTypeRequiredDescription
document_typestringYesDocument type (e.g., ‘CSR’, ‘IND’, ‘Protocol’)
file_pathsarrayYesS3 paths to source documents
document_set_keystringYesUnique key for this document set
document_set_namestringYesHuman-readable name
template_idstringYesGenerated template ID from Section 2
output_namestringYesOutput filename
selected_section_idsarrayNoSpecific sections to include (optional)
document_instructionsstringNoAdditional instructions
style_guide_idstringNoStyle guide for formatting

Section 5: Monitoring Generation Status

Document generation is asynchronous. Poll the status endpoint until complete: curl Example:
curl -X GET "https://api.artosai.com/api/v1/documents/status/celery-task-uuid-456" \
  -H "Authorization: Bearer YOUR_TOKEN"
Python Example:
import requests
import time

url = "https://api.artosai.com/api/v1/documents/status/celery-task-uuid-456"
headers = {"Authorization": "Bearer YOUR_TOKEN"}

# Poll until complete
while True:
    response = requests.get(url, headers=headers)
    result = response.json()
    status = result['status']

    print(f"Status: {status}")

    if status == "Complete":
        print("✓ Document generation complete!")
        break
    elif status == "Failed":
        print(f"✗ Generation failed: {result.get('error', 'Unknown error')}")
        break

    # Wait before next poll
    time.sleep(5)
Response:
{
  "task_id": "celery-task-uuid-456",
  "status": "Generating",
  "progress": null,
  "error": null
}

Status Values

StatusMeaning
GeneratingCurrently processing
CompleteSuccessfully finished
FailedEncountered an error

Section 6: Retrieving Generated Documents

Once generation completes, retrieve the document details and metadata.

Get Document Details

Retrieve the completed document: curl Example:
curl -X GET "https://api.artosai.com/api/v1/documents/document-uuid" \
  -H "Authorization: Bearer YOUR_TOKEN"
Python Example:
import requests

url = "https://api.artosai.com/api/v1/documents/document-uuid"
headers = {"Authorization": "Bearer YOUR_TOKEN"}

response = requests.get(url, headers=headers)
result = response.json()

if response.status_code == 200:
    doc = result['document']
    print(f"✓ Document retrieved")
    print(f"  ID: {doc['document_id']}")
    print(f"  Type: {doc['document_type']}")
    print(f"  Status: {doc['status']}")
    print(f"  Output: {doc['output_name']}")
    print(f"  Sections: {len(doc.get('sections', []))}")
else:
    print(f"✗ Error: {result.get('detail', 'Not found')}")
Response:
{
  "document": {
    "document_id": "document-uuid",
    "document_set_id": "set-uuid-123",
    "document_type": "CSR",
    "status": "Complete",
    "output_name": "CSR_Final_Q1_2024.docx",
    "sections": [
      {
        "section_id": "section-1",
        "title": "Executive Summary",
        "content": "..."
      },
      {
        "section_id": "section-2",
        "title": "Methodology",
        "content": "..."
      }
    ],
    "created_at": "2024-01-25T12:00:00Z",
    "updated_at": "2024-01-25T12:30:00Z"
  }
}

Get Document MRT Details

Retrieve the MRT (Machine Readable Template) details for the generated document: curl Example:
curl -X GET "https://api.artosai.com/api/v1/document-mrt/by-document/document-uuid" \
  -H "Authorization: Bearer YOUR_TOKEN"
Python Example:
import requests

url = "https://api.artosai.com/api/v1/document-mrt/by-document/document-uuid"
headers = {"Authorization": "Bearer YOUR_TOKEN"}

response = requests.get(url, headers=headers)
result = response.json()

if response.status_code == 200:
    mrt = result['outline']
    print(f"✓ Document MRT retrieved")
    print(f"  MRT ID: {mrt['mrt_id']}")
    print(f"  Sections: {len(mrt['sections'])}")

    for section in mrt['sections']:
        print(f"\n  Section: {section['title']}")
        print(f"    Rules: {len(section.get('rules', []))}")
        for rule in section.get('rules', []):
            print(f"      - {rule['rule_type']}: {rule.get('description', 'N/A')}")
            if 'confidence_score' in rule:
                print(f"        Confidence: {rule['confidence_score']:.2%}")
else:
    print(f"✗ Error: {result.get('detail', 'Not found')}")
Response:
{
  "outline": {
    "mrt_id": "mrt-uuid",
    "document_id": "document-uuid",
    "sections": [
      {
        "order_index": 0,
        "level": 1,
        "section_id": "section-1",
        "title": "Executive Summary",
        "synopsis": "High-level overview",
        "rules": [
          {
            "rule_type": "extraction",
            "rule_mode": "auto",
            "confidence_score": 0.95,
            "description": "Extract key findings",
            "generated_content": "..."
          }
        ]
      }
    ],
    "created_at": "2024-01-25T12:00:00Z",
    "updated_at": "2024-01-25T12:30:00Z"
  }
}

Complete End-to-End Workflow

Here’s a complete example showing the entire workflow from template generation to document retrieval.

Bash Script

#!/bin/bash

# Configuration
TOKEN="your_bearer_token"
API="https://api.artosai.com"
ORG_ID="your_org_id"

# Step 1: Upload Template and Example Documents
echo "=== Step 1: Uploading Documents ==="

curl -s -X POST "$API/api/v1/files/upload" \
  -H "Authorization: Bearer $TOKEN" \
  -F "file_name=csr_template.docx" \
  -F "file_content=@csr_template.docx" \
  -F "container=templates" > /dev/null
echo "✓ Template document uploaded"

curl -s -X POST "$API/api/v1/files/upload" \
  -H "Authorization: Bearer $TOKEN" \
  -F "file_name=example_csr_2024.docx" \
  -F "file_content=@example_csr_2024.docx" \
  -F "container=documents" > /dev/null
echo "✓ Example document 1 uploaded"

curl -s -X POST "$API/api/v1/files/upload" \
  -H "Authorization: Bearer $TOKEN" \
  -F "file_name=example_csr_2023.docx" \
  -F "file_content=@example_csr_2023.docx" \
  -F "container=documents" > /dev/null
echo "✓ Example document 2 uploaded"

# Step 2: Generate Template from Examples
echo ""
echo "=== Step 2: Generating Template from Examples ==="

TEMPLATE=$(curl -s -X POST "$API/api/v1/templates/generate" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "template_filename": "csr_template.docx",
    "example_filenames": ["example_csr_2024.docx", "example_csr_2023.docx"],
    "name": "CSR Template 2024",
    "description": "Auto-generated from example documents",
    "document_type": "CSR",
    "tags": ["csr", "auto-generated"]
  }')

TEMPLATE_ID=$(echo $TEMPLATE | jq -r '.template_id // .id')
echo "✓ Template generated: $TEMPLATE_ID"

# Step 3: Upload Source Documents for Generation
echo ""
echo "=== Step 3: Uploading Source Documents ==="

curl -s -X POST "$API/api/v1/files/upload" \
  -H "Authorization: Bearer $TOKEN" \
  -F "file_name=protocol.pdf" \
  -F "[email protected]" \
  -F "container=documents" > /dev/null
echo "✓ Protocol uploaded"

curl -s -X POST "$API/api/v1/files/upload" \
  -H "Authorization: Bearer $TOKEN" \
  -F "file_name=safety_report.pdf" \
  -F "file_content=@safety_report.pdf" \
  -F "container=documents" > /dev/null
echo "✓ Safety report uploaded"

# Step 4: Create Document Set
echo ""
echo "=== Step 4: Creating Document Set ==="
DOCSET=$(curl -s -X POST "$API/api/v1/document-sets/" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"document_set_name": "Q1 2024 CSR"}')

DOCSET_ID=$(echo $DOCSET | jq -r '.document_set_id')
echo "✓ Document set created: $DOCSET_ID"

# Step 5: Request Document Generation
echo ""
echo "=== Step 5: Requesting Document Generation ==="
GEN=$(curl -s -X POST "$API/api/v1/documents/generate" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d "{
    \"document_type\": \"CSR\",
    \"file_paths\": [
      \"s3://artos-bucket/$ORG_ID/documents/protocol.pdf\",
      \"s3://artos-bucket/$ORG_ID/documents/safety_report.pdf\"
    ],
    \"document_set_key\": \"q1-2024\",
    \"document_set_name\": \"Q1 2024 CSR\",
    \"template_id\": \"$TEMPLATE_ID\",
    \"output_name\": \"CSR_Final.docx\"
  }")

TASK_ID=$(echo $GEN | jq -r '.task_id')
echo "✓ Generation request accepted"
echo "  Task ID: $TASK_ID"

# Step 6: Poll Status Until Complete
echo ""
echo "=== Step 6: Monitoring Generation Progress ==="
while true; do
  STATUS=$(curl -s -X GET "$API/api/v1/documents/status/$TASK_ID" \
    -H "Authorization: Bearer $TOKEN" | jq -r '.status')

  echo "Status: $STATUS"

  if [ "$STATUS" = "Complete" ]; then
    echo "✓ Generation complete!"
    DOC_ID=$TASK_ID
    break
  elif [ "$STATUS" = "Failed" ]; then
    echo "✗ Generation failed"
    exit 1
  fi

  sleep 5
done

# Step 7: Retrieve Document Details
echo ""
echo "=== Step 7: Retrieving Document Details ==="
DOC=$(curl -s -X GET "$API/api/v1/documents/$DOC_ID" \
  -H "Authorization: Bearer $TOKEN")

echo "✓ Document retrieved"
echo "  Document ID: $(echo $DOC | jq -r '.document.document_id')"
echo "  Type: $(echo $DOC | jq -r '.document.document_type')"
echo "  Status: $(echo $DOC | jq -r '.document.status')"
echo "  Output: $(echo $DOC | jq -r '.document.output_name')"

echo ""
echo "=== Workflow Complete ==="

Python Script

#!/usr/bin/env python3

import requests
import json
import time
import sys

# Configuration
TOKEN = "your_bearer_token"
API = "https://api.artosai.com"
ORG_ID = "your_org_id"

headers = {
    "Authorization": f"Bearer {TOKEN}",
    "Content-Type": "application/json"
}

def log_step(step, message):
    print(f"\n{'='*50}")
    print(f"Step {step}: {message}")
    print('='*50)

def log_success(message):
    print(f"✓ {message}")

def log_error(message):
    print(f"✗ {message}")
    sys.exit(1)

def upload_file(filename, filepath, container):
    """Upload a file to S3."""
    url = f"{API}/api/v1/files/upload"
    upload_headers = {"Authorization": f"Bearer {TOKEN}"}

    files = {
        "file_name": (None, filename),
        "container": (None, container),
        "file_content": open(filepath, "rb")
    }

    response = requests.post(url, headers=upload_headers, files=files)
    if response.status_code == 200:
        log_success(f"{filename} uploaded")
    else:
        log_error(f"Failed to upload {filename}: {response.json().get('detail')}")

# Step 1: Upload Documents
log_step(1, "Uploading Documents")

upload_file("csr_template.docx", "csr_template.docx", "templates")
upload_file("example_csr_2024.docx", "example_csr_2024.docx", "documents")
upload_file("example_csr_2023.docx", "example_csr_2023.docx", "documents")

# Step 2: Generate Template from Examples
log_step(2, "Generating Template from Examples")

template_data = {
    "template_filename": "csr_template.docx",
    "example_filenames": ["example_csr_2024.docx", "example_csr_2023.docx"],
    "name": "CSR Template 2024",
    "description": "Auto-generated from example documents",
    "document_type": "CSR",
    "tags": ["csr", "auto-generated"]
}

response = requests.post(f"{API}/api/v1/templates/generate", headers=headers, json=template_data)
if response.status_code not in [200, 202]:
    log_error(f"Template generation failed: {response.json().get('detail')}")

result = response.json()
template_id = result.get('template_id') or result.get('id')
log_success(f"Template generated: {template_id}")

# Step 3: Upload Source Documents
log_step(3, "Uploading Source Documents")

upload_file("protocol.pdf", "protocol.pdf", "documents")
upload_file("safety_report.pdf", "safety_report.pdf", "documents")

# Step 4: Create Document Set
log_step(4, "Creating Document Set")

docset_data = {"document_set_name": "Q1 2024 CSR"}
response = requests.post(f"{API}/api/v1/document-sets/", headers=headers, json=docset_data)

if response.status_code != 201:
    log_error(f"Document set creation failed: {response.json().get('detail')}")

docset_id = response.json()['document_set_id']
log_success(f"Document set created: {docset_id}")

# Step 5: Request Document Generation
log_step(5, "Requesting Document Generation")

gen_data = {
    "document_type": "CSR",
    "file_paths": [
        f"s3://artos-bucket/{ORG_ID}/documents/protocol.pdf",
        f"s3://artos-bucket/{ORG_ID}/documents/safety_report.pdf"
    ],
    "document_set_key": "q1-2024",
    "document_set_name": "Q1 2024 CSR",
    "template_id": template_id,
    "output_name": "CSR_Final.docx"
}

response = requests.post(f"{API}/api/v1/documents/generate", headers=headers, json=gen_data)

if response.status_code != 202:
    log_error(f"Generation request failed: {response.json().get('detail')}")

task_id = response.json()['task_id']
log_success(f"Generation request accepted: {task_id}")

# Step 6: Poll Status Until Complete
log_step(6, "Monitoring Generation Progress")

while True:
    response = requests.get(f"{API}/api/v1/documents/status/{task_id}", headers=headers)
    status = response.json()['status']

    print(f"Status: {status}")

    if status == "Complete":
        log_success("Generation complete!")
        doc_id = task_id
        break
    elif status == "Failed":
        log_error(f"Generation failed: {response.json().get('error')}")

    time.sleep(5)

# Step 7: Retrieve Document Details
log_step(7, "Retrieving Document Details")

response = requests.get(f"{API}/api/v1/documents/{doc_id}", headers=headers)
doc = response.json()['document']

log_success("Document retrieved")
print(f"  Document ID: {doc['document_id']}")
print(f"  Type: {doc['document_type']}")
print(f"  Status: {doc['status']}")
print(f"  Output: {doc['output_name']}")

log_step("Complete", "Workflow Complete")
print(f"Document ID: {doc_id}")
print("Ready for download or further processing")

Troubleshooting

Common Issues

Template Generation Fails

Error: 400 Bad Request: File not found Causes:
  • Template or example filenames don’t match uploaded files
  • Files uploaded to wrong container
  • Filename typos
Solution:
  • Verify filenames match exactly (case-sensitive)
  • Upload template to templates container
  • Upload examples to documents container
  • Use just the filename, not full path

File Upload Fails

Error: 400 Bad Request: File type not supported Causes:
  • Unsupported file extension
  • Corrupted file
  • MIME type mismatch
Solution:
  • Use only supported types: PDF, DOCX, XLSX, CSV, RTF
  • Verify file is not corrupted
  • Check file extension matches actual file type

Generation Fails with Template

Error: Task failed: Missing required section Causes:
  • Template ID not found
  • Source documents don’t contain expected content
  • Extraction rules unable to find data
Solution:
  • Verify template ID is correct and generation is complete
  • Ensure source documents contain similar content to examples
  • Review source documents for required sections
  • Try with simpler/more complete source documents

Status Endpoint Returns 404

Error: 404 Not Found: Task not found Causes:
  • Wrong task ID used
  • Task expired (old IDs)
  • Task ID typo
Solution:
  • Copy task_id immediately after generation request
  • Don’t wait more than 24 hours to poll status
  • Check for typos in task ID

Debug Checklist

When troubleshooting generation issues, verify:
  • Bearer token is valid (not expired)
  • All files were uploaded successfully
  • Template filename and example filenames are correct
  • Template generation completed successfully
  • All source files were uploaded (S3 paths are correct)
  • Generation request returned 202 status
  • Task ID is being used correctly for polling
  • Polling every 5-10 seconds (not too frequent)
  • Status endpoint returns valid status values

Performance Considerations

Typical Processing Times:
  • Template generation from examples: 1-3 minutes
  • Document generation (simple): 2-5 minutes
  • Document generation (complex): 5-15 minutes
Factors Affecting Speed:
  • Document size and complexity
  • Number of sections in template
  • Number of extraction rules
  • Available processing resources
Optimization Tips:
  • Use high-quality example documents
  • Start with simple templates (fewer sections)
  • Keep extraction rules focused
  • Monitor system resources

Next Steps

Now that you understand the complete workflow, explore:

Additional Resources