Document Generation
Document generation in Artos is an automated process that transforms source documents into professionally formatted regulatory documents using MRT templates.Overview
The document generation process:- Extract - Extract content from source documents
- Classify - Classify extracted content by document type
- Ingest - Organize classified content for processing
- Outline - Create document outline from MRT template
- Orchestrate - Apply extraction rules and processing
- Generate - Produce final formatted document
The Generation Pipeline
Step 1: Request Generation
Submit a generation request with source documents and template:Step 2: Extract Content
The system extracts content from source documents:Step 3: Classify Content
Content is automatically classified based on:- Document structure
- Section headers
- Content type indicators
- Extraction rules
Step 4: Create Outline
The outline is generated from the MRT template:Step 5: Apply Extraction Rules
For each section, extraction rules are applied to find and process content:Step 6: Populate Outline
Extracted content is populated into the outline:Step 7: Generate Document
The outline is converted to a formatted DOCX document:Step 8: Return Result
Document is saved and retrieval information returned:Status Tracking
Monitor generation progress using the status endpoint:- Generating - Currently processing
- Complete - Successfully finished
- Failed - Error occurred
Retrieval
Once complete, retrieve the document:Generation Configuration
Document Selection
Optionally specify which sections to include:Document Instructions
Provide document-level instructions:Style Guides
Apply a specific style guide:- Font and font sizes
- Colors and formatting
- Section numbering style
- Citation format
- Table formatting
Quality Assurance
Confidence Scores
Each extraction includes a confidence score (0-1):Content Validation
Rules are applied to validate extracted content:- Completeness - All required sections present
- Consistency - Data consistent across document
- Compliance - Meets regulatory requirements
- Format - Proper structure and formatting
Error Handling
If generation fails:- Missing source documents
- Template not found
- Invalid extraction rules
- Insufficient data in sources
- Processing timeout
- Verify all source files were uploaded
- Confirm template ID is correct
- Check that source documents contain required data
- Review extraction rule configuration
- Try with smaller documents first
Typical Workflow
Performance Considerations
Processing Time
Typical processing times:- Simple documents (single source): 2-5 minutes
- Complex documents (multiple sources): 5-15 minutes
- Large datasets: 15-30+ minutes
- Source document size
- Number of sections
- Complexity of extraction rules
- Available processing resources
Limits
- Max file size: 100 MB per document
- Max sections: 100 per template
- Max extraction rules: 500 per template
- Max concurrent generations: 10 per organization
Best Practices
- Organize Sources - Ensure source documents are well-structured
- Test Rules - Validate extraction rules on small samples first
- Monitor Progress - Use status polling to track generation
- Handle Errors - Implement error handling and retry logic
- Archive Results - Keep generated documents for compliance
- Version Control - Track template versions and changes
Related Topics
- MRT Workflow - Template structure and concepts
- Async Operations - How async processing works
- Documents API - API documentation
- Status Polling - Track generation