Create Document From Url

post

https://api.ragie.ai/documents/url

Ingest a document from a publicly accessible URL. On ingest, the document goes through a series of steps before it is ready for retrieval. Each step is reflected in the status of the document which can be one of [pending, partitioning, partitioned, refined, chunked, indexed, summary_indexed, keyword_indexed, ready, failed]. The document is available for retrieval once it is in ready state. The summary index step can take a few seconds. You can optionally use the document for retrieval once it is in indexed state. However the summary will only be available once the state has changed to summary_indexed or ready. PDF files over 2000 pages are not supported. Other static document files over 10000 pages are not supported.

Body Params

name

string

metadata

object

Metadata for the document. Keys must be strings. Values may be strings, numbers, booleans, or lists of strings. Numbers may be integers or floating point and will be converted to 64 bit floating point. 1000 total values are allowed. Each item in an array counts towards the total. The following keys are reserved for internal use: document_id, document_type, document_source, document_name, document_uploaded_at, start_time, end_time, chunk_content_type.

mode

string | MediaModeParam

Defaults to fast

Partition strategy for the document.
Different strategies exist for textual, audio and video file types and you can set the strategy you want for
each file type, or just for textual types.

For textual documents the options are 'hi_res' or 'fast'.
When set to 'hi_res', images and tables will be extracted from the document.
'fast' will only extract text.
'fast' may be up to 20x faster than 'hi_res'.
hi_res is only applicable for Word documents, PDFs, Images, and PowerPoints.
Images will always be processed in hi_res.
If hi_res is set for an unsupported document type, it will be processed and billed in fast mode.

For audio files, the options are true or false. True if you want to process audio, false otherwise.

For video files, the options are 'audio_only', 'video_only', 'audio_video'.
'audio_only' will extract just the audio part of the video.
'video_only' will similarly just extract the video part, ignoring audio.
'audio_video' will extract both audio and video.

To process all media types at the highest quality, use 'all'.

When you specify audio or video stategies, the format must be a JSON object. In this case,
textual documents are denoted by the key "static". If you omit a key, that document type won't be processd.
See examples below.

Examples

Textual documents only
"fast"

Video documents only
{
"video": "audio_video"
}

Specify multiple document types
{
"static": "hi_res",
"audio": true,
"video": "video_only"
}

Specify only textual or audio document types
{
"static": "fast",
"audio": true
}

Highest quality processing for all media types
"all"

Agentic OCR
"agentic_ocr"
Agentic OCR is in early access. agentic_ocr mode extracts content using vision models which can be more accurate, especially across more visually complex documents. If you are interested in accessing this feature, please contact us at [email protected].

external_id

An optional identifier for the document. A common value might be an id in an external system or the URL where the source file may be found.

partition

string

An optional partition identifier. Documents can be scoped to a partition. Partitions must be lowercase alphanumeric and may only include the special characters _ and -. A partition is created any time a document is created.

workflow

string

enum

An optional stage to stop processing the document. If set to "parse" processing will stop once elements have been extracted. Setting it to "index" or leaving it blank will go through the full pipeline.

Allowed:

url

uri

required

length between 1 and 2083

Url of the file to download. Must be publicly accessible and HTTP or HTTPS scheme.

Responses

201Successful Response

400Bad Request

401Unauthorized

402Payment Required

422Validation Error

429Too Many Requests

500Internal Server Error