Skip to main content

Search API

The Search API provides hybrid search capabilities combining vector similarity (pgvector) and full-text search (tsvector) across MRT example chunks.

Hybrid Search Test

Search across MRT example chunks using hybrid search combining vector similarity and full-text search.
POST /api/v1/search/hybrid-test

Request Body

{
  "document_set_key": "project-2024-001",
  "query": "safety findings and adverse events",
  "limit": 20,
  "vector_k": 50,
  "document_filters": ["Protocol*", "Safety*"],
  "only_text_search": false
}

Request Parameters

ParameterTypeRequiredDefaultDescription
document_set_keystringYes-Connector data ID for scoping search
querystringYes-Search query text (non-empty)
limitintegerNo20Max results (1-100)
vector_kintegerNo50Vector candidates (10-200)
document_filtersarrayNo-Document names to filter (wildcard support)
only_text_searchbooleanNofalseSkip vector search if true

Request Example

curl -X POST "https://api.artosai.com/api/v1/search/hybrid-test" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "document_set_key": "project-2024-001",
    "query": "safety findings",
    "limit": 20,
    "document_filters": ["Protocol*"]
  }'

Python Example

import requests

url = "https://api.artosai.com/api/v1/search/hybrid-test"
headers = {
    "Authorization": "Bearer YOUR_TOKEN",
    "Content-Type": "application/json"
}

payload = {
    "document_set_key": "project-2024-001",
    "query": "safety findings",
    "limit": 20,
    "vector_k": 50,
    "document_filters": ["Protocol*"]
}

response = requests.post(url, headers=headers, json=payload)
results = response.json()
print(f"Found {results['total_results']} results")
for chunk in results['chunks']:
    print(f"- {chunk['content']} (score: {chunk['score']})")

Response

{
  "query": "safety findings",
  "document_set_key": "project-2024-001",
  "total_results": 3,
  "search_type": "hybrid",
  "chunks": [
    {
      "id": "chunk-uuid-1",
      "content": "Safety findings from the study include...",
      "score": 0.95,
      "rank": 1,
      "document_name": "Protocol_v2.pdf",
      "document_type": "CSR",
      "section_name": "Safety Analysis",
      "page_number": 42
    },
    {
      "id": "chunk-uuid-2",
      "content": "Adverse events were documented as follows...",
      "score": 0.89,
      "rank": 2,
      "document_name": "Safety_Report.pdf",
      "document_type": "CSR",
      "section_name": "Adverse Events",
      "page_number": 15
    }
  ]
}

Response Fields

FieldTypeDescription
querystringSearch query executed
document_set_keystringDocument set key used
total_resultsintegerNumber of results
search_typestring”hybrid” or “text”
chunksarrayResult chunks
chunks[].idstringChunk identifier
chunks[].contentstringChunk text content
chunks[].scorenumberRelevance score (0-1)
chunks[].rankintegerResult ranking
chunks[].document_namestringSource document name
chunks[].document_typestringDocument type
chunks[].section_namestringSection within document
chunks[].page_numberintegerPage number (if available)

Status Codes

  • 200 OK: Search completed successfully
  • 400 Bad Request: Invalid request parameters (empty query, invalid limits)
  • 401 Unauthorized: Missing or invalid Bearer token
  • 500 Internal Server Error: Search operation failed

Search Types

The API supports two search modes: Hybrid Search (default):
  • Combines vector similarity (semantic search) and full-text search
  • Best for finding contextually relevant content
  • Slower but more accurate
Text-Only Search:
  • Uses only full-text search (tsvector)
  • Faster but less semantic understanding
  • Set only_text_search: true to use

Document Filtering

Filter results by document name using wildcard patterns:
# All documents starting with "Protocol"
"document_filters": ["Protocol*"]

# Multiple filters (OR logic)
"document_filters": ["Protocol*", "Safety*", "Report*"]

# Exact match
"document_filters": ["protocol.pdf"]

Relevance Scoring

Scores range from 0 to 1:
  • 0.9+: Highly relevant
  • 0.7-0.9: Relevant
  • 0.5-0.7: Somewhat relevant
  • <0.5: Low relevance
Results are sorted by score (highest first).

Search Status

Get search service status and configuration (no authentication required).
GET /api/v1/search/status

Request Example

curl -X GET "https://api.artosai.com/api/v1/search/status"

Response

{
  "status": "healthy",
  "search_index": "mrt_example_chunks",
  "search_types": ["hybrid", "text"],
  "features": {
    "vector_search": true,
    "text_search": true,
    "document_filtering": true,
    "wildcard_filters": true
  },
  "defaults": {
    "limit": 20,
    "vector_k": 50
  }
}

Response Fields

FieldTypeDescription
statusstringService status
search_indexstringIndex name
search_typesarrayAvailable search types
featuresobjectEnabled features
defaultsobjectDefault parameters

Status Codes

  • 200 OK: Service is healthy
  • 503 Service Unavailable: Search service is down

Example Workflows

curl -X POST "https://api.artosai.com/api/v1/search/hybrid-test" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "document_set_key": "project-2024",
    "query": "adverse events and safety concerns",
    "limit": 10,
    "document_filters": ["Safety*"]
  }'
curl -X POST "https://api.artosai.com/api/v1/search/hybrid-test" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "document_set_key": "project-2024",
    "query": "methodology",
    "limit": 50,
    "only_text_search": true
  }'

Search Across Specific Documents

curl -X POST "https://api.artosai.com/api/v1/search/hybrid-test" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "document_set_key": "project-2024",
    "query": "statistical analysis",
    "document_filters": ["Protocol_v2.pdf", "Protocol_v1.pdf"]
  }'