Documentation Index
Fetch the complete documentation index at: https://docs.artosai.com/llms.txt
Use this file to discover all available pages before exploring further.
Documents API
The Documents API enables asynchronous document generation from source materials using structured templates. Document generation is processed as a background Celery task and returns immediately with a document ID for status tracking.
List Documents
Retrieve all documents accessible to the authenticated user across all their document sets.
Access rules:
- Internal / Owner roles: all documents within their organization
- All other roles: only documents in document sets they belong to
Request Example
curl -X GET "https://api.artosai.com/api/v1/documents/" \
-H "Authorization: Bearer YOUR_TOKEN"
Response
{
"documents": [
{
"document_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"document_name": "CSR_Final.docx",
"created_at": "2024-01-25 12:00:00.000000",
"updated_at": "2024-01-25 12:45:00.000000",
"workspace_id": "ws-uuid-123",
"template": {
"template_id": "tmpl-uuid-456",
"template_name": "CSR Template"
}
},
{
"document_id": "b2c3d4e5-f6a7-8901-bcde-f12345678901",
"document_name": "Protocol_v2.docx",
"created_at": "2024-01-20 09:30:00.000000",
"updated_at": "2024-01-20 11:00:00.000000",
"workspace_id": "ws-uuid-789",
"template": null
}
]
}
Response Fields
| Field | Type | Description |
|---|
documents | array | List of document items |
documents[].document_id | string | Document UUID |
documents[].document_name | string | Document file name |
documents[].created_at | string | Creation timestamp |
documents[].updated_at | string | Last updated timestamp |
documents[].workspace_id | string | Document set ID the document belongs to |
documents[].template | object | Template metadata (null if no template) |
documents[].template.template_id | string | Template UUID |
documents[].template.template_name | string | Template name |
Status Codes
- 200 OK: Documents retrieved successfully
- 401 Unauthorized: Missing or invalid Bearer token
- 500 Internal Server Error: Database error
Generate Document
Request document generation from source documents using a template. The request is queued as a background Celery task and returns 202 Accepted immediately. A placeholder document record is created synchronously before queuing — use the returned task_id to poll status.
POST /api/v1/documents/generate
Request Body
Fields marked with an alias can be sent using either name — both are accepted.
{
"document_type": "CSR",
"file_paths": ["org-id/documents/protocol.pdf", "org-id/documents/data.xlsx"],
"connector_data_id": "project-2024-001",
"workspace_name": "Q1 CSR Documents",
"template_id": "tmpl-uuid-123",
"output_name": "CSR_Final.docx",
"selected_section_ids": ["section-uuid-1", "section-uuid-2"],
"generic_mrt_outline_full": {},
"document_instructions": "Follow company style guide",
"style_guide_id": "sg-uuid-456"
}
The following field names are interchangeable:
| Alias (also accepted) | Internal field name |
|---|
connector_data_id | document_set_key |
template_id | generic_mrt_id |
workspace_name | document_set_name |
### Request Parameters
The API accepts both the alias name and the internal field name for aliased fields (both are equivalent).
| Parameter | Alias | Type | Required | Description |
|-----------|-------|------|----------|-------------|
| `document_type` | — | string | Yes | Type of document (e.g., `"CSR"`, `"Protocol"`, `"IND"`) |
| `file_paths` | — | array | Yes | S3 file paths for source documents |
| `connector_data_id` | `document_set_key` | string | Yes | Unique key scoping the document set for search |
| `workspace_name` | `document_set_name` | string | No | Human-readable name for the document set (defaults to `""`) |
| `template_id` | `generic_mrt_id` | string | Yes | Template ID to use |
| `output_name` | — | string | No | Name for the generated output file |
| `selected_section_ids` | — | array | No | Specific section IDs (strings or `{section_id, user_instructions}` objects) to include |
| `generic_mrt_outline_full` | — | object | No | Full pre-built outline structure to use directly |
| `document_instructions` | — | string | No | Document-level instructions for generation |
| `style_guide_id` | — | string | No | Style guide ID to apply to the generated document |
### Request Example
```bash
curl -X POST "https://api.artosai.com/api/v1/documents/generate" \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"document_type": "CSR",
"file_paths": ["org-id/documents/protocol.pdf", "org-id/documents/csr-data.xlsx"],
"connector_data_id": "project-2024-001",
"workspace_name": "Q1 CSR Documents",
"template_id": "tmpl-uuid-123",
"output_name": "CSR_Final.docx"
}'
# Note: "document_set_key", "document_set_name", "generic_mrt_id" are accepted too
Python Example
import requests
import time
url = "https://api.artosai.com/api/v1/documents/generate"
headers = {
"Authorization": "Bearer YOUR_TOKEN",
"Content-Type": "application/json"
}
payload = {
"document_type": "CSR",
"file_paths": ["org-id/documents/protocol.pdf"],
# Both aliases below are interchangeable:
"connector_data_id": "project-2024-001", # or "document_set_key"
"workspace_name": "Q1 CSR Documents", # or "document_set_name"
"template_id": "tmpl-uuid-123", # or "generic_mrt_id"
"output_name": "CSR_Final.docx"
}
response = requests.post(url, headers=headers, json=payload)
result = response.json()
document_id = result["task_id"]
print(f"Document ID: {document_id}")
Response (202 Accepted)
{
"message": "Request to generate document has been accepted and is being processed in the background.",
"task_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890"
}
Response Fields
| Field | Type | Description |
|---|
message | string | Status message |
task_id | string | The document UUID — use this to poll status and retrieve the document |
Status Codes
- 202 Accepted: Request accepted for background processing
- 400 Bad Request: Missing required parameters or document set not found
- 401 Unauthorized: Missing or invalid Bearer token
- 500 Internal Server Error: Database operation failed
Idempotency
If a document with the same output_name already exists for your organization, the existing document ID is returned immediately (no duplicate is created).
Document Generation Pipeline
The background task performs the following steps:
- Extract — Extract and classify content from source documents
- Ingest — Ingest documents using classification results
- Create Outline — Generate an outline from the template
- Orchestrate — Execute document outline rule orchestration
- Generate — Produce the final DOCX document
Get Document Status
Poll the current status of a document being generated.
GET /api/v1/documents/status/{document_id}
Path Parameters
| Parameter | Type | Required | Description |
|---|
document_id | string | Yes | UUID of the document (returned as task_id from /generate) |
Request Example
curl -X GET "https://api.artosai.com/api/v1/documents/status/a1b2c3d4-e5f6-7890-abcd-ef1234567890" \
-H "Authorization: Bearer YOUR_TOKEN"
Response
{
"task_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"status": "Generating",
"progress": null,
"error": null
}
Status Values
| Status | Description |
|---|
Pending | Document accepted but not yet picked up by a worker |
Ingesting | Source documents are being ingested |
Generating | Document content is being generated |
Ready | Document generation completed successfully |
Failed | Document generation encountered an error |
Response Fields
| Field | Type | Description |
|---|
task_id | string | Document UUID |
status | string | Current processing status |
progress | integer | Progress percentage (reserved — currently always null) |
error | string | Error message if status is "Failed", otherwise null |
Status Codes
- 200 OK: Status retrieved successfully
- 404 Not Found: Document not found
- 500 Internal Server Error: Database error
Polling Workflow
#!/bin/bash
TOKEN="your_bearer_token"
API="https://api.artosai.com"
DOC_ID="a1b2c3d4-e5f6-7890-abcd-ef1234567890"
while true; do
RESPONSE=$(curl -s -X GET "$API/api/v1/documents/status/$DOC_ID" \
-H "Authorization: Bearer $TOKEN")
STATUS=$(echo $RESPONSE | jq -r '.status')
echo "Status: $STATUS"
if [ "$STATUS" = "Ready" ]; then
echo "Document generation complete!"
break
elif [ "$STATUS" = "Failed" ]; then
echo "Error: $(echo $RESPONSE | jq -r '.error')"
exit 1
fi
sleep 5
done
Get Single Document
Retrieve a completed document by ID. Returns all document metadata and sections.
GET /api/v1/documents/{document_id}
Path Parameters
| Parameter | Type | Required | Description |
|---|
document_id | string | Yes | UUID of the document |
Request Example
curl -X GET "https://api.artosai.com/api/v1/documents/a1b2c3d4-e5f6-7890-abcd-ef1234567890" \
-H "Authorization: Bearer YOUR_TOKEN"
Response
{
"document": {
"document_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"document_name": "CSR_Final.docx",
"document_set": "Q1 CSR Documents",
"document_set_name": "Q1 CSR Documents",
"document_type": "CSR",
"product_name": "",
"status": "Ready",
"version": 0,
"template_id": "tmpl-uuid-123",
"template_nickname": "CSR",
"organization_id": "org-uuid-789",
"user": "user@example.com",
"all_sources": ["org-id/documents/protocol.pdf", "org-id/documents/csr-data.xlsx"],
"selected_section_ids": ["section-uuid-1", "section-uuid-2"],
"last_regeneration": "2024-01-25T12:45:00Z",
"created_at": "2024-01-25T12:00:00Z",
"updated_at": "2024-01-25T12:45:00Z"
}
}
Status Codes
- 200 OK: Document retrieved successfully
- 401 Unauthorized: Missing or invalid Bearer token
- 403 Forbidden: Document belongs to a different organization
- 404 Not Found: Document not found
- 500 Internal Server Error: Database error
Create Local Document
Upload a Word document directly from the Office Add-in and create a minimal document record. Unlike /generate, this does not process the document through the generation pipeline — it simply stores the document for use with chat functionality.
POST /api/v1/documents/local
Request
Content-Type: multipart/form-data
| Parameter | Type | Required | Description |
|---|
file | file | Yes | The .docx file to upload |
Only .docx files are accepted.
Request Example
curl -X POST "https://api.artosai.com/api/v1/documents/local" \
-H "Authorization: Bearer YOUR_TOKEN" \
-F "file=@/path/to/my-document.docx"
Python Example
import requests
url = "https://api.artosai.com/api/v1/documents/local"
headers = {"Authorization": "Bearer YOUR_TOKEN"}
with open("/path/to/my-document.docx", "rb") as f:
response = requests.post(
url,
headers=headers,
files={"file": ("my-document.docx", f, "application/vnd.openxmlformats-officedocument.wordprocessingml.document")}
)
result = response.json()
print(f"Document ID: {result['document_id']}")
print(f"S3 Key: {result['s3_key']}")
Response (201 Created)
{
"document_id": "c3d4e5f6-a7b8-9012-cdef-123456789012",
"document_name": "my-document.docx",
"s3_key": "org-uuid-789/local-documents/c3d4e5f6-a7b8-9012-cdef-123456789012/my-document.docx",
"status": "Ready"
}
Response Fields
| Field | Type | Description |
|---|
document_id | string | UUID of the created document |
document_name | string | Name of the uploaded file |
s3_key | string | S3 key where the document is stored |
status | string | Always "Ready" for local documents |
Status Codes
- 201 Created: Document uploaded and record created successfully
- 400 Bad Request: Invalid file type (only
.docx accepted)
- 401 Unauthorized: Authentication failed
- 500 Internal Server Error: S3 upload or database error
Get Sections for Document
Retrieve a flat list of all section identifiers and their associated metadata within a given document. Used to enumerate the full section structure, enabling the Sources panel selection dropdown to be pre-populated with all available sections upon document load.
POST /get-sections-for-document
Request Body
{
"document_id": "doc_9f3c1e72ab84"
}
Request Parameters
| Parameter | Type | Required | Description |
|---|
document_id | string | Yes | Unique identifier of the document whose section list is being retrieved. Must reference an existing document accessible by the authenticated user. |
Request Example
curl -X POST "https://api.artosai.com/get-sections-for-document" \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"document_id": "doc_9f3c1e72ab84"
}'
Python Example
import requests
url = "https://api.artosai.com/get-sections-for-document"
headers = {
"Authorization": "Bearer YOUR_TOKEN",
"Content-Type": "application/json"
}
payload = {
"document_id": "doc_9f3c1e72ab84"
}
response = requests.post(url, headers=headers, json=payload)
sections = response.json()["sections"]
for section in sections:
print(f"{section['section_order']}: {section['section_title']}")
Response
{
"sections": [
{
"section_id": "1.1 Title of Study:",
"section_title": "1.1 Title of Study:",
"section_order": 0
},
{
"section_id": "1.2 Study Objectives:",
"section_title": "1.2 Study Objectives:",
"section_order": 1
},
{
"section_id": "1.3 Study Design:",
"section_title": "1.3 Study Design:",
"section_order": 2
},
{
"section_id": "1.4 Study Population:",
"section_title": "1.4 Study Population:",
"section_order": 3
},
{
"section_id": "1.5 Study Duration:",
"section_title": "1.5 Study Duration:",
"section_order": 4
}
]
}
Response Fields
| Field | Type | Description |
|---|
sections | array | Array of section objects, sorted by document order |
sections[].section_id | string | Unique identifier of the section (the section title). Used as section_id in the get-sources-for-selected-text endpoint. |
sections[].section_title | string | Human-readable display name of the section, rendered as an option in the Sources panel selection dropdown. |
sections[].section_order | integer | Zero-based index representing the section’s position within the document, used to render dropdown options in document order. |
Status Codes
- 200 OK: Successfully retrieved section list
- 400 Bad Request: Authentication failed or invalid Bearer token
- 404 Not Found: Document MRT not found for the given document ID
Complete Workflow Example
#!/bin/bash
TOKEN="your_bearer_token"
API="https://api.artosai.com"
# Step 1: Upload source documents
echo "Uploading source documents..."
curl -X POST "$API/api/v1/files/upload" \
-H "Authorization: Bearer $TOKEN" \
-F "file_name=protocol.pdf" \
-F "file_content=@protocol.pdf" \
-F "container=documents"
# Step 2: Request document generation
echo "Requesting document generation..."
RESPONSE=$(curl -s -X POST "$API/api/v1/documents/generate" \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"document_type": "CSR",
"file_paths": ["org-id/documents/protocol.pdf"],
"connector_data_id": "project-2024",
"workspace_name": "Q1 CSR",
"template_id": "tmpl-uuid-123",
"output_name": "CSR_Final.docx"
}')
DOC_ID=$(echo $RESPONSE | jq -r '.task_id')
echo "Document ID: $DOC_ID"
# Step 3: Poll status until Ready
echo "Waiting for generation to complete..."
while true; do
STATUS=$(curl -s -X GET "$API/api/v1/documents/status/$DOC_ID" \
-H "Authorization: Bearer $TOKEN" | jq -r '.status')
echo "Status: $STATUS"
if [ "$STATUS" = "Ready" ]; then
echo "Document generation complete!"
break
elif [ "$STATUS" = "Failed" ]; then
echo "Document generation failed"
exit 1
fi
sleep 5
done
# Step 4: Retrieve document
echo "Retrieving document..."
curl -X GET "$API/api/v1/documents/$DOC_ID" \
-H "Authorization: Bearer $TOKEN" | jq '.document'