Skip to main content
POST
/
ingest
/
document
Upload a document for ingestion and content extraction
curl --request POST \
  --url https://api.artosai.com/ingest/document \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: multipart/form-data' \
  --form file='@example-file' \
  --form 'fileType=<string>' \
  --form 'extractionFormat=<string>' \
  --form 'csvDelimiter=<string>' \
  --form htmlIncludeStyles=true \
  --form preserveFormatting=true \
  --form extractImages=true \
  --form extractTables=true \
  --form 'connectorDataId=<string>'
{
  "jobId": "<string>",
  "status": "<string>",
  "created_at": "2023-11-07T05:31:56Z",
  "estimated_completion": "2023-11-07T05:31:56Z"
}

Authorizations

Authorization
string
header
required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Body

multipart/form-data
file
file
required

Document file to upload and process.

fileType
string

Override automatic file type detection (e.g. docx, pdf).

extractionFormat
string

Desired output format for extracted content (plain, html, markdown, csv, json, xml, base64).

csvDelimiter
string

CSV delimiter to use when extractionFormat is csv.

htmlIncludeStyles
boolean

Include CSS styling in HTML extraction.

preserveFormatting
boolean

Maintain original document formatting.

extractImages
boolean

Extract and encode images separately.

extractTables
boolean

Extract tables as structured data.

connectorDataId
string

Identifier used to batch uploads for connectors.

Response

200 - application/json

Ingestion job created successfully

jobId
string
status
string
created_at
string<date-time>
estimated_completion
string<date-time>