Upload a document for ingestion and content extraction

curl --request POST \
  --url https://api.artosai.com/ingest/document \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: multipart/form-data' \
  --form file='@example-file' \
  --form 'fileType=<string>' \
  --form 'extractionFormat=<string>' \
  --form 'csvDelimiter=<string>' \
  --form htmlIncludeStyles=true \
  --form preserveFormatting=true \
  --form extractImages=true \
  --form extractTables=true \
  --form 'connectorDataId=<string>'

{
  "jobId": "<string>",
  "status": "<string>",
  "created_at": "2023-11-07T05:31:56Z",
  "estimated_completion": "2023-11-07T05:31:56Z"
}

POST

ingest

document

Upload a document for ingestion and content extraction

curl --request POST \
  --url https://api.artosai.com/ingest/document \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: multipart/form-data' \
  --form file='@example-file' \
  --form 'fileType=<string>' \
  --form 'extractionFormat=<string>' \
  --form 'csvDelimiter=<string>' \
  --form htmlIncludeStyles=true \
  --form preserveFormatting=true \
  --form extractImages=true \
  --form extractTables=true \
  --form 'connectorDataId=<string>'

{
  "jobId": "<string>",
  "status": "<string>",
  "created_at": "2023-11-07T05:31:56Z",
  "estimated_completion": "2023-11-07T05:31:56Z"
}

Authorizations

Authorization

string

header

required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Body

multipart/form-data

file

required

Document file to upload and process.

fileType

string

Override automatic file type detection (e.g. docx, pdf).

extractionFormat

string

Desired output format for extracted content (plain, html, markdown, csv, json, xml, base64).

csvDelimiter

string

CSV delimiter to use when extractionFormat is csv.

htmlIncludeStyles

boolean

Include CSS styling in HTML extraction.

preserveFormatting

boolean

Maintain original document formatting.

extractImages

boolean

Extract and encode images separately.

extractTables

boolean

Extract tables as structured data.

connectorDataId

string

Identifier used to batch uploads for connectors.

Response

200 - application/json

Ingestion job created successfully

jobId

string

status

string

created_at

string<date-time>

estimated_completion

string<date-time>

Poll status of an ingestion job

⌘I

Ingestion

Orchestration

Documents

Connectors

Agents

Pipelines

Audit

Analytics

Quotas

Monitoring

Alerts

Upload a document for ingestion and content extraction

Authorizations

Body

Response