API Walkthrough

In this guide we walk through the specific API features for audio and video.

Create Document

The mode parameter is an object or string, to support backwards compatibility. In the mode object, static can be hi_res or fast, audio is a boolean, and video can be audio_only, video_only, or audio_video. audio_video mode will process both the audio and video of the uploaded video file, while the other 2 options will only process their respective tracks.

Static - Backwards Compatibility

Uploading a static file is backwards compatible with the mode constants.

const file = new File([blob], "presentation.pdf");

const result = await ragie.documents.create({
  file: file,
  mode: "hi_res", // or "fast"
  metadata: {
    category: "business",
  },
});

Static - Object

const file = new File([blob], "presentation.pdf");

const result = await ragie.documents.create({
  file: file,
  mode: {
        static: "hi_res", // or "fast"
  };
  metadata: {
    category: "business",
  },
});

Audio

const file = new File([blob], "presentation.mp3");

const result = await ragie.documents.create({
  file: file,
  mode: {
        audio: true,
  };
  metadata: {
    category: "business",
  },
});

Video

const file = new File([blob], "presentation.mp4");

const result = await ragie.documents.create({
  file: file,
  mode: {
        video: "audio_video", // or "audio_only", "video_only"
  };
  metadata: {
    category: "business",
  },
});

Setting Multiple Modes

You can set multiple modes at once; the applicable mode will be used for the uploaded file.

const file = new File([blob], "presentation.mp4");

const result = await ragie.documents.create({
  file: file,
  mode: {
    static: "hi_res",
    audio: true,
    video: "audio_video",
  };
  metadata: {
    category: "business",
  },
});

Response

The response from the Create Document API is the same for any file type.

{
  "status": "partitioning",
  "id": "id123",
  "created_at": "2025-05-15T19:58:03.483172Z",
  "updated_at": "2025-05-15T19:58:03.586856Z",
  "name": "presentation.mp4",
  "metadata": {},
  "partition": "default",
  "chunk_count": null,
  "external_id": "",
  "page_count": null
}

Get Document Chunks

The Get Document Chunks endpoint returns additional information relevant to audio and video files. The request schema is the same. The response includes end_time and start_time in the metadata field in each chunk. For audio and video files, the links field contains links to the chunk, its text, a stream URL, and a download URL.

Example response:


{
  "pagination": {
    "next_cursor": null,
    "total_count": 10
  },
  "chunks": [
    {
      "id": "id123",
      "index": 0,
      "text": "The text of the chunk.",
      "metadata": {
        "end_time": 57.68,
        "start_time": 2.44
      },
      "links": {
        "self": {
          "href": "https://api.ragie.ai/documents/docId/chunks/chunkId",
          "type": "application/json"
        },
        "self_text": {
          "href":     "https://api.ragie.ai/documents/docId/chunks/chunkId/content?media_type=text/plain-text",
          "type": "text/plain-text"
        },
"document": {
          "href": "https://api.ragie.ai/documents/d5c36cb0-0ec4-46bd-aacc-626c252598e4",
          "type": "application/json"
        },
        "document_text": {
          "href": "https://api.ragie.ai/documents/docId/content?media_type=text/plain-text",
          "type": "text/plain-text"
        },
        "self_audio_stream": {
          "href": "https://api.ragie.ai/documents/docId/chunks/chunkId/content?media_type=audio/mpeg",
          "type": "audio/mpeg"
        },
        "self_audio_download": {
          "href": "https://api.ragie.ai/documents/docId/chunks/chunkId/content?media_type=audio/mpeg&download=true",
          "type": "audio/mpeg"
        },"document_audio_stream": {
          "href": "https://api.ragie.ai/documents/docId/content?media_type=audio/mpeg",
          "type": "audio/mpeg"
        },
        "document_audio_download": {
          "href": "https://api.ragie.ai/documents/docId/content?media_type=audio/mpeg&download=true",
          "type": "audio/mpeg"
        }
      }
    },
    {...

For a video document, the response would be the same, except the links object would contain self, self_text, self_video_stream, self_video_download, document_video_stream, and document_video_download.
The end_time and start_time values represent the times when the chunk ends and starts, respectively, in the entire audio or video file.

Get Document Chunk

The Get Document Chunk endpoint includes even more granular data for each chunk of an audio or video file, namely word-level timestamps. In addition to the metadata and links fields, this endpoint returns modality_data that contains word-level timestamps.

Example response:


{
  "id": "id123",
  "index": 0,
  "text": "The text of the chunk.",
  "metadata": {
    "end_time": 57.68,
    "start_time": 2.44
  },
  "links": {
    "self": {
      "href": "https://api.ragie.ai/documents/docId/chunks/chunkId",
      "type": "application/json"
    },
    "self_text": {
      "href": "https://api.ragie.ai/documents/docId/chunks/chunkId/content?media_type=text/plain-text",
      "type": "text/plain-text"
    },
"document": {
      "href": "https://api.ragie.ai/documents/docId",
      "type": "application/json"
    },
    "document_text": {
      "href": "https://api.ragie.ai/documents/docId/content?media_type=text/plain-text",
      "type": "text/plain-text"
    },
    "self_audio_stream": {
      "href": "https://api.ragie.ai/documents/docId/chunks/chunkId/content?media_type=audio/mpeg",
      "type": "audio/mpeg"
    },
    "self_audio_download": {
      "href": "https://api.ragie.ai/documents/docId/chunks/chunkId/content?media_type=audio/mpeg&download=true",
      "type": "audio/mpeg"
    },
"document_audio_stream": {
      "href": "https://api.ragie.ai/documents/docId/content?media_type=audio/mpeg",
      "type": "audio/mpeg"
    },
    "document_audio_download": {
      "href": "https://api.ragie.ai/documents/docId/content?media_type=audio/mpeg&download=true",
      "type": "audio/mpeg"
    }
  },
  "modality_data": {
    "type": "audio",
    "word_timestamps": [
      {
        "start_time": 2.44,
        "end_time": 3.06,
        "word": " President",
        "probability": 0.75537109375
      },
      {
        "start_time": 3.06,
        "end_time": 3.38,
        "word": " Pitzer,",
        "probability": 0.67431640625
      },
...

For static files, the modality_data field will be returned as null.

Get Document Chunk Content

The Get Document Chunk Content endpoint returns the content of a document chunk in the requested format. This can be used to stream media of the content for audio and video documents.

Request

The media_type parameter is used to describe the desired mime type of the content returned. If the requested media_type is not supported for the chunk’s document type, an error will occur.

Example request:

const response = await client.documents.getChunkContent({
      documentId: "docId",
      chunkId: "chunkId",
      mediaType: "audio/mpeg",
});

Response

The response will be in the mime type specified in the request. If the download parameter is false, chunks from audio and video files will return a stream of raw data. If download is true, the content will be returned as a named file for download. This endpoint will behave the same as Get Document Chunk if media_type is set to application/json.

Retrievals

The Retrieve endpoint works across all modalities, including audio and video.
You can get 3 different types of chunks in the retrieval: text, audio, and video. The retrieval request schema remains the for all modalities. For audio and video, the response contains the following extra information:

Example response:

{
  "scored_chunks": [
    {
      "text": "This is the text of a chunk",
      "score": 0.19090909090909092,
      "id": "chunkId",
      "index": 2,
      "metadata": {
        "end_time": 167.5,
        "start_time": 115.37
      },
      "document_id": "docId",
      "document_name": "presentation.mp3",
      "document_metadata": {},
      "links": {
        "self": {
          "href": "https://api.ragie.ai/documents/docId/chunks/chunkId",
          "type": "application/json"
        },
        "self_text": {
          "href": "https://api.ragie.ai/documents/docId/chunks/chunkId/content?media_type=text/plain-text",
          "type": "text/plain-text"
        },
        "self_audio_stream": {
          "href": "https://api.ragie.ai/documents/docId/chunks/chunkId/content?media_type=audio/mpeg",
          "type": "audio/mpeg"
        },
        "self_audio_download": {
          "href": "https://api.ragie.ai/documents/docId/chunks/chunkid/content?media_type=audio/mpeg&download=true",
          "type": "audio/mpeg"
        }
      }
    },
...

The scoredChunks array will include metadata that contains the end_time and start_time for the specific chunk. Note the document’s specific metadata is in the document_metadata field.


Understanding a Chunk's Links

Various links may be provided with chunks. Here is a breakdown of the links provided:


TextAudio (download)Audio (stream)Video (download)Video (stream)
self_textself_audio_downloadself_audio_streamself_video_downloadself_video_stream
ChunkThe plain text of the chunk.Download the audio of the chunk.Stream the audio of the chunk.Download the video of the chunk.Stream the video of the chunk.
document_textdocument_audio_downloaddocument_audio_streamdocument_video_downloaddocument_video_stream
Documentthe plain text of the document.Download the audio of the document.Stream the audio of the document.Download the video of the document.Stream the video of the document.

Stream the video of a specific chunk:

const streamUrl = chunk.links.self_audio_stream.href;

Download the entire video document of which the chunk is a member:

const downloadUrl = chunk.links.document_video_download.href;