Template and Document Generation Cookbook
This cookbook walks you through the complete workflow of automatically generating a Machine Readable Template (MRT) from example documents and using it to generate new documents. You’ll learn by uploading example CSR (Clinical Study Report) documents, automatically generating a reusable template from them, and then generating new documents using that template.Overview
The document generation workflow follows these steps:- Upload Template and Example Documents - Provide a template document and sample documents
- Generate Template from Examples - System automatically creates an MRT structure from your examples
- Upload Source Documents - Prepare source materials for new document generation
- Create Document Set - Organize documents into a logical group
- Generate Document - Request document generation using the auto-generated template
- Monitor Progress - Poll status until generation completes
- Retrieve Results - Download the generated document and metadata
Prerequisites
Before starting, ensure you have:- API Token - Valid Bearer token for authentication (see Authentication)
- API Endpoint - Access to
https://api.artosai.com - Tools - curl, Python 3.6+, or equivalent HTTP client
- Example Documents - Sample regulatory documents (PDF, DOCX) that show your desired template structure
- Source Documents - Regulatory documents to process (PDF, DOCX, Excel)
Section 1: Preparing and Uploading Documents
The template generation system learns from example documents. Prepare documents that represent your desired template structure.Document Types
Template Document (optional):- A single document showing the desired output structure and format
- Should include section headers, formatting, and style guidelines
- Helps the system understand your preferred organization
- 1-3 existing documents of the type you want to generate
- Should follow the same structure as your desired output
- Used to identify sections, content patterns, and extraction rules
- More diverse examples = better template
- Raw materials to be processed
- Will be analyzed and content extracted to fill the generated template
- Can be different from examples (extracted content will follow template structure)
Upload Documents
Use the Files API to upload template and example documents: curl Example - Upload Template Document:Section 2: Generating a Template from Examples
ThePOST /api/v1/templates/generate endpoint automatically analyzes your example documents and generates a structured MRT template with sections and extraction rules.
How Template Generation Works
The system:- Analyzes example documents for structure and content patterns
- Identifies sections, headers, and content types
- Generates extraction rules based on patterns found
- Creates a reusable MRT with hierarchical sections
- Returns a template ID for document generation
Create a Generic MRT
Submit template generation request with your documents: curl Example:Request Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
template_filename | string | Yes | Filename of template document (will be prefixed with org S3 path) |
example_filenames | array | No | Filenames of example documents for analysis |
name | string | Yes | Name for the generated MRT |
description | string | No | Description of the template |
document_type | string | No | Document type (default: “CSR”) |
tags | array | No | Tags for organization and search |
default_connector_data_ids | array | No | Default data source connectors |
cache_version | string | No | Cache version for reprocessing |
Generated Template Structure
The system automatically creates sections with:- Hierarchical organization - Top-level and nested sections
- Extraction rules - Auto-identified rules for content extraction
- Content patterns - Recognized from examples
- Reusable structure - Can be applied to similar documents
template_id for document generation steps.
Section 3: Uploading Source Documents
Before generating documents, upload the source materials to be processed.Supported File Types
| Type | Extension | Notes |
|---|---|---|
.pdf | Recommended for documents | |
| Word | .docx | Auto-converted to PDF (except in templates container) |
| Excel | .xlsx | For data/tables |
| CSV | .csv | For structured data |
| RTF | .rtf | Rich text format |
Upload a File
Use the Files API to upload documents: curl Example:Upload Multiple Files
Repeat the upload process for each source document:File Container Types
| Container | Purpose | Auto-Convert |
|---|---|---|
documents | Processed documents | Yes (DOCX→PDF) |
templates | Template files | No |
input | Source documents | Yes |
output | Generated output files | No |
Section 4: Generating a Document
Now that you have a generated template and source documents, request document generation.Create a Document Set
First, organize your documents into a document set: curl Example:Request Document Generation
Submit a generation request using your generated template: curl Example:Generation Request Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
document_type | string | Yes | Document type (e.g., ‘CSR’, ‘IND’, ‘Protocol’) |
file_paths | array | Yes | S3 paths to source documents |
document_set_key | string | Yes | Unique key for this document set |
document_set_name | string | Yes | Human-readable name |
template_id | string | Yes | Generated template ID from Section 2 |
output_name | string | Yes | Output filename |
selected_section_ids | array | No | Specific sections to include (optional) |
document_instructions | string | No | Additional instructions |
style_guide_id | string | No | Style guide for formatting |
Section 5: Monitoring Generation Status
Document generation is asynchronous. Poll the status endpoint until complete: curl Example:Status Values
| Status | Meaning |
|---|---|
Generating | Currently processing |
Complete | Successfully finished |
Failed | Encountered an error |
Section 6: Retrieving Generated Documents
Once generation completes, retrieve the document details and metadata.Get Document Details
Retrieve the completed document: curl Example:Get Document MRT Details
Retrieve the MRT (Machine Readable Template) details for the generated document: curl Example:Complete End-to-End Workflow
Here’s a complete example showing the entire workflow from template generation to document retrieval.Bash Script
Python Script
Troubleshooting
Common Issues
Template Generation Fails
Error:400 Bad Request: File not found
Causes:
- Template or example filenames don’t match uploaded files
- Files uploaded to wrong container
- Filename typos
- Verify filenames match exactly (case-sensitive)
- Upload template to
templatescontainer - Upload examples to
documentscontainer - Use just the filename, not full path
File Upload Fails
Error:400 Bad Request: File type not supported
Causes:
- Unsupported file extension
- Corrupted file
- MIME type mismatch
- Use only supported types: PDF, DOCX, XLSX, CSV, RTF
- Verify file is not corrupted
- Check file extension matches actual file type
Generation Fails with Template
Error:Task failed: Missing required section
Causes:
- Template ID not found
- Source documents don’t contain expected content
- Extraction rules unable to find data
- Verify template ID is correct and generation is complete
- Ensure source documents contain similar content to examples
- Review source documents for required sections
- Try with simpler/more complete source documents
Status Endpoint Returns 404
Error:404 Not Found: Task not found
Causes:
- Wrong task ID used
- Task expired (old IDs)
- Task ID typo
- Copy
task_idimmediately after generation request - Don’t wait more than 24 hours to poll status
- Check for typos in task ID
Debug Checklist
When troubleshooting generation issues, verify:- Bearer token is valid (not expired)
- All files were uploaded successfully
- Template filename and example filenames are correct
- Template generation completed successfully
- All source files were uploaded (S3 paths are correct)
- Generation request returned 202 status
- Task ID is being used correctly for polling
- Polling every 5-10 seconds (not too frequent)
- Status endpoint returns valid status values
Performance Considerations
Typical Processing Times:- Template generation from examples: 1-3 minutes
- Document generation (simple): 2-5 minutes
- Document generation (complex): 5-15 minutes
- Document size and complexity
- Number of sections in template
- Number of extraction rules
- Available processing resources
- Use high-quality example documents
- Start with simple templates (fewer sections)
- Keep extraction rules focused
- Monitor system resources
Next Steps
Now that you understand the complete workflow, explore:- MRT Workflow Concepts - Deeper understanding of template structure
- Document Generation Pipeline - How generation works internally
- Documents API Reference - Document generation endpoints
- Async Operations - Background processing details
Additional Resources
- API Playground - Test endpoints interactively at https://api.artosai.com/docs
- Authentication - Set up and manage API tokens at https://artosai.com
- Support - Contact support at [email protected]