Documents Management
List All Documents
Retrieve a list of all processed documents accessible to your user account. Endpoint| Parameter | Type | Required | Description |
|---|---|---|---|
page | Integer | No | Page number for pagination (default: 1) |
limit | Integer | No | Number of documents per page (default: 20, max: 100) |
file_type | String | No | Filter by file type (docx, html, rtf, csv) |
document_name | String | No | Filter by document name (partial match) |
- Admin users: See all documents in the organization
- Regular users: Only see documents assigned to them
- Note: Only documents with
processing_status: "completed"are returned
Get Specific Document
Retrieve detailed information about a specific document by filename. Endpoint| Parameter | Type | Required | Description |
|---|---|---|---|
filename | String | Yes | Full filename including any group/folder structure |
Document Deletion
Only admin users can delete documents from the system. EndpointContent Retrieval
The retrieval system uses hybrid search (vector embeddings + keyword matching with reranking) to find relevant passages from your ingested documents. This functionality is specifically designed to support document generation workflows.Important Limitation
This is not a general-purpose search API. The retrieval functionality is optimized for finding content that will be used in AI document generation. For comprehensive document search needs, consider dedicated search solutions.Retrieve Relevant Passages
Endpoint| Parameter | Type | Required | Description |
|---|---|---|---|
q | String | Yes | Search query or question |
sourceUrls | String | No | Comma-separated list of S3 URLs to search within |
topK | Integer | No | Number of results to return (default: 5, recommended: 10-25) |
search_type | String | No | Search method: “hybrid” (default), “vector”, “keyword” |
Search Types
Hybrid Search (Default)
Combines vector embeddings and keyword matching with intelligent reranking for optimal results.Vector Search Only
Uses semantic similarity through vector embeddings to find conceptually related content.Keyword Search Only
Traditional text-based search using keyword matching.Filtering by Source Documents
Search within specific documents by providing their S3 URLs:Success Response (200 OK)
Response Fields
| Field | Description |
|---|---|
url | S3 URL of the source document |
snippet | Relevant text passage (length not configurable) |
score | Relevance score (0.0 to 1.0, higher = more relevant) |
document_title | Human-readable document title |
document_type | Document classification |
file_synopsis | Brief summary of document content |
Empty Results Response
When no relevant content is found:Rate Limiting
Both documents and retrieval endpoints implement rate limiting due to underlying LLM processing requirements.Rate Limits by User Level
Admin Users- Document listing: 100 requests per minute
- Document retrieval: 100 requests per minute
- Content search: 50 requests per minute
- Document listing: 50 requests per minute
- Document retrieval: 50 requests per minute
- Content search: 20 requests per minute
Rate Limit Headers
Rate Limit Exceeded Response (429)
Document Access & Download
Accessing Full Document Content
To access the complete content of a document, use the S3 URL provided in the document metadata:JavaScript Example
Error Handling
Common Error Codes
| Error Code | HTTP Status | Description |
|---|---|---|
DOCUMENT_NOT_FOUND | 404 | Requested document does not exist |
ACCESS_DENIED | 403 | User lacks permission to access document |
INVALID_QUERY | 400 | Search query is malformed or empty |
INVALID_SOURCE_URL | 400 | One or more source URLs are invalid |
RATE_LIMIT_EXCEEDED | 429 | Too many requests within time window |
SEARCH_TIMEOUT | 408 | Search query took too long to process |
Error Response Format
Best Practices
Efficient Document Management
- Use pagination for large document collections
- Filter by file type when looking for specific document formats
- Monitor processing status before attempting retrieval
- Cache document lists when possible to reduce API calls
Effective Content Retrieval
- Be specific in queries - more specific queries yield better results
- Use appropriate topK values - recommended range is 10-25
- Choose the right search type:
- Hybrid: Best for most use cases
- Vector: Better for conceptual/semantic queries
- Keyword: Better for exact term matching
- Limit source URLs when searching specific documents
- Implement proper error handling for empty results