Skip to main content

Documents API

The Documents API enables asynchronous document generation from source materials using structured templates. Document generation is processed as a background Celery task and returns immediately with a document ID for status tracking.

List Documents

Retrieve all documents accessible to the authenticated user across all their document sets.
GET /api/v1/documents/
Access rules:
  • Internal / Owner roles: all documents within their organization
  • All other roles: only documents in document sets they belong to

Request Example

curl -X GET "https://api.artosai.com/api/v1/documents/" \
  -H "Authorization: Bearer YOUR_TOKEN"

Response

{
  "documents": [
    {
      "document_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
      "document_name": "CSR_Final.docx",
      "created_at": "2024-01-25 12:00:00.000000",
      "updated_at": "2024-01-25 12:45:00.000000",
      "workspace_id": "ws-uuid-123",
      "template": {
        "template_id": "tmpl-uuid-456",
        "template_name": "CSR Template"
      }
    },
    {
      "document_id": "b2c3d4e5-f6a7-8901-bcde-f12345678901",
      "document_name": "Protocol_v2.docx",
      "created_at": "2024-01-20 09:30:00.000000",
      "updated_at": "2024-01-20 11:00:00.000000",
      "workspace_id": "ws-uuid-789",
      "template": null
    }
  ]
}

Response Fields

FieldTypeDescription
documentsarrayList of document items
documents[].document_idstringDocument UUID
documents[].document_namestringDocument file name
documents[].created_atstringCreation timestamp
documents[].updated_atstringLast updated timestamp
documents[].workspace_idstringDocument set ID the document belongs to
documents[].templateobjectTemplate metadata (null if no template)
documents[].template.template_idstringTemplate UUID
documents[].template.template_namestringTemplate name

Status Codes

  • 200 OK: Documents retrieved successfully
  • 401 Unauthorized: Missing or invalid Bearer token
  • 500 Internal Server Error: Database error

Generate Document

Request document generation from source documents using a template. The request is queued as a background Celery task and returns 202 Accepted immediately. A placeholder document record is created synchronously before queuing — use the returned task_id to poll status.
POST /api/v1/documents/generate

Request Body

Fields marked with an alias can be sent using either name — both are accepted.
{
  "document_type": "CSR",
  "file_paths": ["org-id/documents/protocol.pdf", "org-id/documents/data.xlsx"],
  "connector_data_id": "project-2024-001",
  "workspace_name": "Q1 CSR Documents",
  "template_id": "tmpl-uuid-123",
  "output_name": "CSR_Final.docx",
  "selected_section_ids": ["section-uuid-1", "section-uuid-2"],
  "generic_mrt_outline_full": {},
  "document_instructions": "Follow company style guide",
  "style_guide_id": "sg-uuid-456"
}
The following field names are interchangeable:
Alias (also accepted)Internal field name
connector_data_iddocument_set_key
template_idgeneric_mrt_id
workspace_namedocument_set_name

### Request Parameters

The API accepts both the alias name and the internal field name for aliased fields (both are equivalent).

| Parameter | Alias | Type | Required | Description |
|-----------|-------|------|----------|-------------|
| `document_type` | — | string | Yes | Type of document (e.g., `"CSR"`, `"Protocol"`, `"IND"`) |
| `file_paths` | — | array | Yes | S3 file paths for source documents |
| `connector_data_id` | `document_set_key` | string | Yes | Unique key scoping the document set for search |
| `workspace_name` | `document_set_name` | string | No | Human-readable name for the document set (defaults to `""`) |
| `template_id` | `generic_mrt_id` | string | Yes | Template ID to use |
| `output_name` | — | string | No | Name for the generated output file |
| `selected_section_ids` | — | array | No | Specific section IDs (strings or `{section_id, user_instructions}` objects) to include |
| `generic_mrt_outline_full` | — | object | No | Full pre-built outline structure to use directly |
| `document_instructions` | — | string | No | Document-level instructions for generation |
| `style_guide_id` | — | string | No | Style guide ID to apply to the generated document |

### Request Example

```bash
curl -X POST "https://api.artosai.com/api/v1/documents/generate" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "document_type": "CSR",
    "file_paths": ["org-id/documents/protocol.pdf", "org-id/documents/csr-data.xlsx"],
    "connector_data_id": "project-2024-001",
    "workspace_name": "Q1 CSR Documents",
    "template_id": "tmpl-uuid-123",
    "output_name": "CSR_Final.docx"
  }'
# Note: "document_set_key", "document_set_name", "generic_mrt_id" are accepted too

Python Example

import requests
import time

url = "https://api.artosai.com/api/v1/documents/generate"
headers = {
    "Authorization": "Bearer YOUR_TOKEN",
    "Content-Type": "application/json"
}

payload = {
    "document_type": "CSR",
    "file_paths": ["org-id/documents/protocol.pdf"],
    # Both aliases below are interchangeable:
    "connector_data_id": "project-2024-001",   # or "document_set_key"
    "workspace_name": "Q1 CSR Documents",       # or "document_set_name"
    "template_id": "tmpl-uuid-123",             # or "generic_mrt_id"
    "output_name": "CSR_Final.docx"
}

response = requests.post(url, headers=headers, json=payload)
result = response.json()
document_id = result["task_id"]
print(f"Document ID: {document_id}")

Response (202 Accepted)

{
  "message": "Request to generate document has been accepted and is being processed in the background.",
  "task_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890"
}

Response Fields

FieldTypeDescription
messagestringStatus message
task_idstringThe document UUID — use this to poll status and retrieve the document

Status Codes

  • 202 Accepted: Request accepted for background processing
  • 400 Bad Request: Missing required parameters or document set not found
  • 401 Unauthorized: Missing or invalid Bearer token
  • 500 Internal Server Error: Database operation failed

Idempotency

If a document with the same output_name already exists for your organization, the existing document ID is returned immediately (no duplicate is created).

Document Generation Pipeline

The background task performs the following steps:
  1. Extract — Extract and classify content from source documents
  2. Ingest — Ingest documents using classification results
  3. Create Outline — Generate an outline from the template
  4. Orchestrate — Execute document outline rule orchestration
  5. Generate — Produce the final DOCX document

Get Document Status

Poll the current status of a document being generated.
GET /api/v1/documents/status/{document_id}

Path Parameters

ParameterTypeRequiredDescription
document_idstringYesUUID of the document (returned as task_id from /generate)

Request Example

curl -X GET "https://api.artosai.com/api/v1/documents/status/a1b2c3d4-e5f6-7890-abcd-ef1234567890" \
  -H "Authorization: Bearer YOUR_TOKEN"

Response

{
  "task_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  "status": "Generating",
  "progress": null,
  "error": null
}

Status Values

StatusDescription
PendingDocument accepted but not yet picked up by a worker
IngestingSource documents are being ingested
GeneratingDocument content is being generated
ReadyDocument generation completed successfully
FailedDocument generation encountered an error

Response Fields

FieldTypeDescription
task_idstringDocument UUID
statusstringCurrent processing status
progressintegerProgress percentage (reserved — currently always null)
errorstringError message if status is "Failed", otherwise null

Status Codes

  • 200 OK: Status retrieved successfully
  • 404 Not Found: Document not found
  • 500 Internal Server Error: Database error

Polling Workflow

#!/bin/bash
TOKEN="your_bearer_token"
API="https://api.artosai.com"
DOC_ID="a1b2c3d4-e5f6-7890-abcd-ef1234567890"

while true; do
  RESPONSE=$(curl -s -X GET "$API/api/v1/documents/status/$DOC_ID" \
    -H "Authorization: Bearer $TOKEN")
  STATUS=$(echo $RESPONSE | jq -r '.status')

  echo "Status: $STATUS"

  if [ "$STATUS" = "Ready" ]; then
    echo "Document generation complete!"
    break
  elif [ "$STATUS" = "Failed" ]; then
    echo "Error: $(echo $RESPONSE | jq -r '.error')"
    exit 1
  fi

  sleep 5
done

Get Single Document

Retrieve a completed document by ID. Returns all document metadata and sections.
GET /api/v1/documents/{document_id}

Path Parameters

ParameterTypeRequiredDescription
document_idstringYesUUID of the document

Request Example

curl -X GET "https://api.artosai.com/api/v1/documents/a1b2c3d4-e5f6-7890-abcd-ef1234567890" \
  -H "Authorization: Bearer YOUR_TOKEN"

Response

{
  "document": {
    "document_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
    "document_name": "CSR_Final.docx",
    "document_set": "Q1 CSR Documents",
    "document_set_name": "Q1 CSR Documents",
    "document_type": "CSR",
    "product_name": "",
    "status": "Ready",
    "version": 0,
    "template_id": "tmpl-uuid-123",
    "template_nickname": "CSR",
    "organization_id": "org-uuid-789",
    "user": "user@example.com",
    "all_sources": ["org-id/documents/protocol.pdf", "org-id/documents/csr-data.xlsx"],
    "selected_section_ids": ["section-uuid-1", "section-uuid-2"],
    "last_regeneration": "2024-01-25T12:45:00Z",
    "created_at": "2024-01-25T12:00:00Z",
    "updated_at": "2024-01-25T12:45:00Z"
  }
}

Status Codes

  • 200 OK: Document retrieved successfully
  • 401 Unauthorized: Missing or invalid Bearer token
  • 403 Forbidden: Document belongs to a different organization
  • 404 Not Found: Document not found
  • 500 Internal Server Error: Database error

Create Local Document

Upload a Word document directly from the Office Add-in and create a minimal document record. Unlike /generate, this does not process the document through the generation pipeline — it simply stores the document for use with chat functionality.
POST /api/v1/documents/local

Request

Content-Type: multipart/form-data
ParameterTypeRequiredDescription
filefileYesThe .docx file to upload
Only .docx files are accepted.

Request Example

curl -X POST "https://api.artosai.com/api/v1/documents/local" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -F "file=@/path/to/my-document.docx"

Python Example

import requests

url = "https://api.artosai.com/api/v1/documents/local"
headers = {"Authorization": "Bearer YOUR_TOKEN"}

with open("/path/to/my-document.docx", "rb") as f:
    response = requests.post(
        url,
        headers=headers,
        files={"file": ("my-document.docx", f, "application/vnd.openxmlformats-officedocument.wordprocessingml.document")}
    )

result = response.json()
print(f"Document ID: {result['document_id']}")
print(f"S3 Key: {result['s3_key']}")

Response (201 Created)

{
  "document_id": "c3d4e5f6-a7b8-9012-cdef-123456789012",
  "document_name": "my-document.docx",
  "s3_key": "org-uuid-789/local-documents/c3d4e5f6-a7b8-9012-cdef-123456789012/my-document.docx",
  "status": "Ready"
}

Response Fields

FieldTypeDescription
document_idstringUUID of the created document
document_namestringName of the uploaded file
s3_keystringS3 key where the document is stored
statusstringAlways "Ready" for local documents

Status Codes

  • 201 Created: Document uploaded and record created successfully
  • 400 Bad Request: Invalid file type (only .docx accepted)
  • 401 Unauthorized: Authentication failed
  • 500 Internal Server Error: S3 upload or database error

Get Sections for Document

Retrieve a flat list of all section identifiers and their associated metadata within a given document. Used to enumerate the full section structure, enabling the Sources panel selection dropdown to be pre-populated with all available sections upon document load.
POST /get-sections-for-document

Request Body

{
  "document_id": "doc_9f3c1e72ab84"
}

Request Parameters

ParameterTypeRequiredDescription
document_idstringYesUnique identifier of the document whose section list is being retrieved. Must reference an existing document accessible by the authenticated user.

Request Example

curl -X POST "https://api.artosai.com/get-sections-for-document" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "document_id": "doc_9f3c1e72ab84"
  }'

Python Example

import requests

url = "https://api.artosai.com/get-sections-for-document"
headers = {
    "Authorization": "Bearer YOUR_TOKEN",
    "Content-Type": "application/json"
}

payload = {
    "document_id": "doc_9f3c1e72ab84"
}

response = requests.post(url, headers=headers, json=payload)
sections = response.json()["sections"]
for section in sections:
    print(f"{section['section_order']}: {section['section_title']}")

Response

{
  "sections": [
    {
      "section_id": "1.1 Title of Study:",
      "section_title": "1.1 Title of Study:",
      "section_order": 0
    },
    {
      "section_id": "1.2 Study Objectives:",
      "section_title": "1.2 Study Objectives:",
      "section_order": 1
    },
    {
      "section_id": "1.3 Study Design:",
      "section_title": "1.3 Study Design:",
      "section_order": 2
    },
    {
      "section_id": "1.4 Study Population:",
      "section_title": "1.4 Study Population:",
      "section_order": 3
    },
    {
      "section_id": "1.5 Study Duration:",
      "section_title": "1.5 Study Duration:",
      "section_order": 4
    }
  ]
}

Response Fields

FieldTypeDescription
sectionsarrayArray of section objects, sorted by document order
sections[].section_idstringUnique identifier of the section (the section title). Used as section_id in the get-sources-for-selected-text endpoint.
sections[].section_titlestringHuman-readable display name of the section, rendered as an option in the Sources panel selection dropdown.
sections[].section_orderintegerZero-based index representing the section’s position within the document, used to render dropdown options in document order.

Status Codes

  • 200 OK: Successfully retrieved section list
  • 400 Bad Request: Authentication failed or invalid Bearer token
  • 404 Not Found: Document MRT not found for the given document ID

Complete Workflow Example

#!/bin/bash

TOKEN="your_bearer_token"
API="https://api.artosai.com"

# Step 1: Upload source documents
echo "Uploading source documents..."
curl -X POST "$API/api/v1/files/upload" \
  -H "Authorization: Bearer $TOKEN" \
  -F "file_name=protocol.pdf" \
  -F "file_content=@protocol.pdf" \
  -F "container=documents"

# Step 2: Request document generation
echo "Requesting document generation..."
RESPONSE=$(curl -s -X POST "$API/api/v1/documents/generate" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "document_type": "CSR",
    "file_paths": ["org-id/documents/protocol.pdf"],
    "connector_data_id": "project-2024",
    "workspace_name": "Q1 CSR",
    "template_id": "tmpl-uuid-123",
    "output_name": "CSR_Final.docx"
  }')

DOC_ID=$(echo $RESPONSE | jq -r '.task_id')
echo "Document ID: $DOC_ID"

# Step 3: Poll status until Ready
echo "Waiting for generation to complete..."
while true; do
  STATUS=$(curl -s -X GET "$API/api/v1/documents/status/$DOC_ID" \
    -H "Authorization: Bearer $TOKEN" | jq -r '.status')

  echo "Status: $STATUS"

  if [ "$STATUS" = "Ready" ]; then
    echo "Document generation complete!"
    break
  elif [ "$STATUS" = "Failed" ]; then
    echo "Document generation failed"
    exit 1
  fi

  sleep 5
done

# Step 4: Retrieve document
echo "Retrieving document..."
curl -X GET "$API/api/v1/documents/$DOC_ID" \
  -H "Authorization: Bearer $TOKEN" | jq '.document'