> ## Documentation Index
> Fetch the complete documentation index at: https://docs.artosai.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Documents API

> Generate and retrieve documents using templates

# Documents API

The Documents API enables asynchronous document generation from source materials using structured templates. Document generation is processed as a background Celery task and returns immediately with a document ID for status tracking.

## List Documents

Retrieve all documents accessible to the authenticated user across all their document sets.

```bash theme={null}
GET /api/v1/documents/
```

Access rules:

* **Internal / Owner** roles: all documents within their organization
* **All other roles**: only documents in document sets they belong to

### Request Example

```bash theme={null}
curl -X GET "https://api.artosai.com/api/v1/documents/" \
  -H "Authorization: Bearer YOUR_TOKEN"
```

### Response

```json theme={null}
{
  "documents": [
    {
      "document_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
      "document_name": "CSR_Final.docx",
      "created_at": "2024-01-25 12:00:00.000000",
      "updated_at": "2024-01-25 12:45:00.000000",
      "workspace_id": "ws-uuid-123",
      "template": {
        "template_id": "tmpl-uuid-456",
        "template_name": "CSR Template"
      }
    },
    {
      "document_id": "b2c3d4e5-f6a7-8901-bcde-f12345678901",
      "document_name": "Protocol_v2.docx",
      "created_at": "2024-01-20 09:30:00.000000",
      "updated_at": "2024-01-20 11:00:00.000000",
      "workspace_id": "ws-uuid-789",
      "template": null
    }
  ]
}
```

### Response Fields

| Field                                | Type   | Description                             |
| ------------------------------------ | ------ | --------------------------------------- |
| `documents`                          | array  | List of document items                  |
| `documents[].document_id`            | string | Document UUID                           |
| `documents[].document_name`          | string | Document file name                      |
| `documents[].created_at`             | string | Creation timestamp                      |
| `documents[].updated_at`             | string | Last updated timestamp                  |
| `documents[].workspace_id`           | string | Document set ID the document belongs to |
| `documents[].template`               | object | Template metadata (null if no template) |
| `documents[].template.template_id`   | string | Template UUID                           |
| `documents[].template.template_name` | string | Template name                           |

### Status Codes

* **200 OK**: Documents retrieved successfully
* **401 Unauthorized**: Missing or invalid Bearer token
* **500 Internal Server Error**: Database error

***

## Generate Document

Request document generation from source documents using a template. The request is queued as a background Celery task and returns 202 Accepted immediately. A placeholder document record is created synchronously before queuing — use the returned `task_id` to poll status.

```bash theme={null}
POST /api/v1/documents/generate
```

### Request Body

Fields marked with an alias can be sent using **either** name — both are accepted.

```json theme={null}
{
  "document_type": "CSR",
  "file_paths": ["org-id/documents/protocol.pdf", "org-id/documents/data.xlsx"],
  "connector_data_id": "project-2024-001",
  "workspace_name": "Q1 CSR Documents",
  "template_id": "tmpl-uuid-123",
  "output_name": "CSR_Final.docx",
  "selected_section_ids": ["section-uuid-1", "section-uuid-2"],
  "generic_mrt_outline_full": {},
  "document_instructions": "Follow company style guide",
  "style_guide_id": "sg-uuid-456"
}
```

The following field names are interchangeable:

| Alias (also accepted) | Internal field name |
| --------------------- | ------------------- |
| `connector_data_id`   | `document_set_key`  |
| `template_id`         | `generic_mrt_id`    |
| `workspace_name`      | `document_set_name` |

````

### Request Parameters

The API accepts both the alias name and the internal field name for aliased fields (both are equivalent).

| Parameter | Alias | Type | Required | Description |
|-----------|-------|------|----------|-------------|
| `document_type` | — | string | Yes | Type of document (e.g., `"CSR"`, `"Protocol"`, `"IND"`) |
| `file_paths` | — | array | Yes | S3 file paths for source documents |
| `connector_data_id` | `document_set_key` | string | Yes | Unique key scoping the document set for search |
| `workspace_name` | `document_set_name` | string | No | Human-readable name for the document set (defaults to `""`) |
| `template_id` | `generic_mrt_id` | string | Yes | Template ID to use |
| `output_name` | — | string | No | Name for the generated output file |
| `selected_section_ids` | — | array | No | Specific section IDs (strings or `{section_id, user_instructions}` objects) to include |
| `generic_mrt_outline_full` | — | object | No | Full pre-built outline structure to use directly |
| `document_instructions` | — | string | No | Document-level instructions for generation |
| `style_guide_id` | — | string | No | Style guide ID to apply to the generated document |

### Request Example

```bash
curl -X POST "https://api.artosai.com/api/v1/documents/generate" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "document_type": "CSR",
    "file_paths": ["org-id/documents/protocol.pdf", "org-id/documents/csr-data.xlsx"],
    "connector_data_id": "project-2024-001",
    "workspace_name": "Q1 CSR Documents",
    "template_id": "tmpl-uuid-123",
    "output_name": "CSR_Final.docx"
  }'
# Note: "document_set_key", "document_set_name", "generic_mrt_id" are accepted too
````

### Python Example

```python theme={null}
import requests
import time

url = "https://api.artosai.com/api/v1/documents/generate"
headers = {
    "Authorization": "Bearer YOUR_TOKEN",
    "Content-Type": "application/json"
}

payload = {
    "document_type": "CSR",
    "file_paths": ["org-id/documents/protocol.pdf"],
    # Both aliases below are interchangeable:
    "connector_data_id": "project-2024-001",   # or "document_set_key"
    "workspace_name": "Q1 CSR Documents",       # or "document_set_name"
    "template_id": "tmpl-uuid-123",             # or "generic_mrt_id"
    "output_name": "CSR_Final.docx"
}

response = requests.post(url, headers=headers, json=payload)
result = response.json()
document_id = result["task_id"]
print(f"Document ID: {document_id}")
```

### Response (202 Accepted)

```json theme={null}
{
  "message": "Request to generate document has been accepted and is being processed in the background.",
  "task_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890"
}
```

### Response Fields

| Field     | Type   | Description                                                           |
| --------- | ------ | --------------------------------------------------------------------- |
| `message` | string | Status message                                                        |
| `task_id` | string | The document UUID — use this to poll status and retrieve the document |

### Status Codes

* **202 Accepted**: Request accepted for background processing
* **400 Bad Request**: Missing required parameters or document set not found
* **401 Unauthorized**: Missing or invalid Bearer token
* **500 Internal Server Error**: Database operation failed

### Idempotency

If a document with the same `output_name` already exists for your organization, the existing document ID is returned immediately (no duplicate is created).

### Document Generation Pipeline

The background task performs the following steps:

1. **Extract** — Extract and classify content from source documents
2. **Ingest** — Ingest documents using classification results
3. **Create Outline** — Generate an outline from the template
4. **Orchestrate** — Execute document outline rule orchestration
5. **Generate** — Produce the final DOCX document

***

## Get Document Status

Poll the current status of a document being generated.

```bash theme={null}
GET /api/v1/documents/status/{document_id}
```

### Path Parameters

| Parameter     | Type   | Required | Description                                                   |
| ------------- | ------ | -------- | ------------------------------------------------------------- |
| `document_id` | string | Yes      | UUID of the document (returned as `task_id` from `/generate`) |

### Request Example

```bash theme={null}
curl -X GET "https://api.artosai.com/api/v1/documents/status/a1b2c3d4-e5f6-7890-abcd-ef1234567890" \
  -H "Authorization: Bearer YOUR_TOKEN"
```

### Response

```json theme={null}
{
  "task_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  "status": "Generating",
  "progress": null,
  "error": null
}
```

### Status Values

| Status       | Description                                         |
| ------------ | --------------------------------------------------- |
| `Pending`    | Document accepted but not yet picked up by a worker |
| `Ingesting`  | Source documents are being ingested                 |
| `Generating` | Document content is being generated                 |
| `Ready`      | Document generation completed successfully          |
| `Failed`     | Document generation encountered an error            |

### Response Fields

| Field      | Type    | Description                                              |
| ---------- | ------- | -------------------------------------------------------- |
| `task_id`  | string  | Document UUID                                            |
| `status`   | string  | Current processing status                                |
| `progress` | integer | Progress percentage (reserved — currently always `null`) |
| `error`    | string  | Error message if status is `"Failed"`, otherwise `null`  |

### Status Codes

* **200 OK**: Status retrieved successfully
* **404 Not Found**: Document not found
* **500 Internal Server Error**: Database error

### Polling Workflow

```bash theme={null}
#!/bin/bash
TOKEN="your_bearer_token"
API="https://api.artosai.com"
DOC_ID="a1b2c3d4-e5f6-7890-abcd-ef1234567890"

while true; do
  RESPONSE=$(curl -s -X GET "$API/api/v1/documents/status/$DOC_ID" \
    -H "Authorization: Bearer $TOKEN")
  STATUS=$(echo $RESPONSE | jq -r '.status')

  echo "Status: $STATUS"

  if [ "$STATUS" = "Ready" ]; then
    echo "Document generation complete!"
    break
  elif [ "$STATUS" = "Failed" ]; then
    echo "Error: $(echo $RESPONSE | jq -r '.error')"
    exit 1
  fi

  sleep 5
done
```

***

## Get Single Document

Retrieve a completed document by ID. Returns all document metadata and sections.

```bash theme={null}
GET /api/v1/documents/{document_id}
```

### Path Parameters

| Parameter     | Type   | Required | Description          |
| ------------- | ------ | -------- | -------------------- |
| `document_id` | string | Yes      | UUID of the document |

### Request Example

```bash theme={null}
curl -X GET "https://api.artosai.com/api/v1/documents/a1b2c3d4-e5f6-7890-abcd-ef1234567890" \
  -H "Authorization: Bearer YOUR_TOKEN"
```

### Response

```json theme={null}
{
  "document": {
    "document_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
    "document_name": "CSR_Final.docx",
    "document_set": "Q1 CSR Documents",
    "document_set_name": "Q1 CSR Documents",
    "document_type": "CSR",
    "product_name": "",
    "status": "Ready",
    "version": 0,
    "template_id": "tmpl-uuid-123",
    "template_nickname": "CSR",
    "organization_id": "org-uuid-789",
    "user": "user@example.com",
    "all_sources": ["org-id/documents/protocol.pdf", "org-id/documents/csr-data.xlsx"],
    "selected_section_ids": ["section-uuid-1", "section-uuid-2"],
    "last_regeneration": "2024-01-25T12:45:00Z",
    "created_at": "2024-01-25T12:00:00Z",
    "updated_at": "2024-01-25T12:45:00Z"
  }
}
```

### Status Codes

* **200 OK**: Document retrieved successfully
* **401 Unauthorized**: Missing or invalid Bearer token
* **403 Forbidden**: Document belongs to a different organization
* **404 Not Found**: Document not found
* **500 Internal Server Error**: Database error

***

## Create Local Document

Upload a Word document directly from the Office Add-in and create a minimal document record. Unlike `/generate`, this does **not** process the document through the generation pipeline — it simply stores the document for use with chat functionality.

```bash theme={null}
POST /api/v1/documents/local
```

### Request

**Content-Type**: `multipart/form-data`

| Parameter | Type | Required | Description                |
| --------- | ---- | -------- | -------------------------- |
| `file`    | file | Yes      | The `.docx` file to upload |

Only `.docx` files are accepted.

### Request Example

```bash theme={null}
curl -X POST "https://api.artosai.com/api/v1/documents/local" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -F "file=@/path/to/my-document.docx"
```

### Python Example

```python theme={null}
import requests

url = "https://api.artosai.com/api/v1/documents/local"
headers = {"Authorization": "Bearer YOUR_TOKEN"}

with open("/path/to/my-document.docx", "rb") as f:
    response = requests.post(
        url,
        headers=headers,
        files={"file": ("my-document.docx", f, "application/vnd.openxmlformats-officedocument.wordprocessingml.document")}
    )

result = response.json()
print(f"Document ID: {result['document_id']}")
print(f"S3 Key: {result['s3_key']}")
```

### Response (201 Created)

```json theme={null}
{
  "document_id": "c3d4e5f6-a7b8-9012-cdef-123456789012",
  "document_name": "my-document.docx",
  "s3_key": "org-uuid-789/local-documents/c3d4e5f6-a7b8-9012-cdef-123456789012/my-document.docx",
  "status": "Ready"
}
```

### Response Fields

| Field           | Type   | Description                          |
| --------------- | ------ | ------------------------------------ |
| `document_id`   | string | UUID of the created document         |
| `document_name` | string | Name of the uploaded file            |
| `s3_key`        | string | S3 key where the document is stored  |
| `status`        | string | Always `"Ready"` for local documents |

### Status Codes

* **201 Created**: Document uploaded and record created successfully
* **400 Bad Request**: Invalid file type (only `.docx` accepted)
* **401 Unauthorized**: Authentication failed
* **500 Internal Server Error**: S3 upload or database error

***

## Get Sections for Document

Retrieve a flat list of all section identifiers and their associated metadata within a given document. Used to enumerate the full section structure, enabling the Sources panel selection dropdown to be pre-populated with all available sections upon document load.

```bash theme={null}
POST /get-sections-for-document
```

### Request Body

```json theme={null}
{
  "document_id": "doc_9f3c1e72ab84"
}
```

### Request Parameters

| Parameter     | Type   | Required | Description                                                                                                                                        |
| ------------- | ------ | -------- | -------------------------------------------------------------------------------------------------------------------------------------------------- |
| `document_id` | string | Yes      | Unique identifier of the document whose section list is being retrieved. Must reference an existing document accessible by the authenticated user. |

### Request Example

```bash theme={null}
curl -X POST "https://api.artosai.com/get-sections-for-document" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "document_id": "doc_9f3c1e72ab84"
  }'
```

### Python Example

```python theme={null}
import requests

url = "https://api.artosai.com/get-sections-for-document"
headers = {
    "Authorization": "Bearer YOUR_TOKEN",
    "Content-Type": "application/json"
}

payload = {
    "document_id": "doc_9f3c1e72ab84"
}

response = requests.post(url, headers=headers, json=payload)
sections = response.json()["sections"]
for section in sections:
    print(f"{section['section_order']}: {section['section_title']}")
```

### Response

```json theme={null}
{
  "sections": [
    {
      "section_id": "1.1 Title of Study:",
      "section_title": "1.1 Title of Study:",
      "section_order": 0
    },
    {
      "section_id": "1.2 Study Objectives:",
      "section_title": "1.2 Study Objectives:",
      "section_order": 1
    },
    {
      "section_id": "1.3 Study Design:",
      "section_title": "1.3 Study Design:",
      "section_order": 2
    },
    {
      "section_id": "1.4 Study Population:",
      "section_title": "1.4 Study Population:",
      "section_order": 3
    },
    {
      "section_id": "1.5 Study Duration:",
      "section_title": "1.5 Study Duration:",
      "section_order": 4
    }
  ]
}
```

### Response Fields

| Field                      | Type    | Description                                                                                                                  |
| -------------------------- | ------- | ---------------------------------------------------------------------------------------------------------------------------- |
| `sections`                 | array   | Array of section objects, sorted by document order                                                                           |
| `sections[].section_id`    | string  | Unique identifier of the section (the section title). Used as `section_id` in the `get-sources-for-selected-text` endpoint.  |
| `sections[].section_title` | string  | Human-readable display name of the section, rendered as an option in the Sources panel selection dropdown.                   |
| `sections[].section_order` | integer | Zero-based index representing the section's position within the document, used to render dropdown options in document order. |

### Status Codes

* **200 OK**: Successfully retrieved section list
* **400 Bad Request**: Authentication failed or invalid Bearer token
* **404 Not Found**: Document MRT not found for the given document ID

***

## Complete Workflow Example

```bash theme={null}
#!/bin/bash

TOKEN="your_bearer_token"
API="https://api.artosai.com"

# Step 1: Upload source documents
echo "Uploading source documents..."
curl -X POST "$API/api/v1/files/upload" \
  -H "Authorization: Bearer $TOKEN" \
  -F "file_name=protocol.pdf" \
  -F "file_content=@protocol.pdf" \
  -F "container=documents"

# Step 2: Request document generation
echo "Requesting document generation..."
RESPONSE=$(curl -s -X POST "$API/api/v1/documents/generate" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "document_type": "CSR",
    "file_paths": ["org-id/documents/protocol.pdf"],
    "connector_data_id": "project-2024",
    "workspace_name": "Q1 CSR",
    "template_id": "tmpl-uuid-123",
    "output_name": "CSR_Final.docx"
  }')

DOC_ID=$(echo $RESPONSE | jq -r '.task_id')
echo "Document ID: $DOC_ID"

# Step 3: Poll status until Ready
echo "Waiting for generation to complete..."
while true; do
  STATUS=$(curl -s -X GET "$API/api/v1/documents/status/$DOC_ID" \
    -H "Authorization: Bearer $TOKEN" | jq -r '.status')

  echo "Status: $STATUS"

  if [ "$STATUS" = "Ready" ]; then
    echo "Document generation complete!"
    break
  elif [ "$STATUS" = "Failed" ]; then
    echo "Document generation failed"
    exit 1
  fi

  sleep 5
done

# Step 4: Retrieve document
echo "Retrieving document..."
curl -X GET "$API/api/v1/documents/$DOC_ID" \
  -H "Authorization: Bearer $TOKEN" | jq '.document'
```
