Elements API
The Elements API allows you to get the elements extracted from documents. For example, if you upload a PDF with a lot of Tables, you might want to get the content just for those - skipping the headers, images, and so on.
Some of the elements are only available in agentic mode - currently in beta. Want to try it out? Contact us at [email protected]
Structure
An Element looks like this
{
"id": "9d0799a2-d891-4c4f-ba84-e507b631023d",
"created_at": "2026-02-14T00:41:50.150050Z",
"index": 6,
"metadata": {
"languages": [
"eng"
],
"filetype": "application/vnd.openxmlformats-officedocument.wordprocessingml.document"
},
"type": "Table",
"text": "<table>\n <thead>\n <tr>\n <th>\n animal...",
"markdown": "\n\n| animal | behavior | friendliness | utility |...",
"location": null,
"data": {
"type": "Table",
"content": "<table>....",
"description": "The table consists of information about different animals and their characteristics...",
"header_range": null
}
}Here are the general properties, see Element Types & Data Content for more details
| Property | Description | Type |
|---|---|---|
| id | unique id of the element | string |
| created_at | when the element was created | string |
| index | The relative location of the element in the document when viewed top to bottom. A smaller number means earlier, 0 being the first. | number |
| metadata | Additional information about the element | |
| type | The element type | |
| text | Textual representation of the element | |
| markdown | The markdown representation of the element, if appropriate. | |
| location | Where in the document the element was found. This is context sensitive. | |
| data_content | Element specific data |
Location
This helps identify where an element is location relative to the document.
Bounding box
Locations in PDFs are bounding boxes of where the element was found in the page.
{
"location_type": "bounding_box",
"left": 0.0965,
"top": 0.048,
"width": 0.154,
"height": 0.0339,
"page_number": 1
}
Left, top, width, and height are all Normalized values. So 0 for left means the left-most, and 1 means the right most part of the page.
Character index
Inside a markdown file, locations are indicated by character ranges.
{
"location_type": "character_index",
"start_char_index": 23,
"end_char_index": 1037
}
Spreadsheet
Locations inside a spreadsheet tell you what cell or cell range elements were found in.
{
"location_type": "spreadsheet",
"range": "B3:B3",
"sheet_name": "Sheet1",
"sheet_index": 0
}
Duration
Inside audio and video segments, you find duration locations. These tell you where in the file the content was found.
{
"location_type": "duration",
"start_time": 0,
"end_time": 9.96,
"duration": 9.96
}
Document Elements
The following element types may appear when parsing a document.
Some element types are specialized and only appear for certain types. For example,
AudioTranscriptionSegmentandVideoSegmentare audio/video-specific and are not detected in PDFs.
| Element Type | Description | Typical Content | Supported Mode |
|---|---|---|---|
Address | Postal address text. | Mailing address | fast, hi_res, agentic_ocr |
AudioTranscriptionSegment | Segment of transcribed audio. | Segment text and per-word timing | audio/video |
Author | Author name or byline. | Person or organization name | agentic_ocr |
Barcode | Barcode and any nearby associated text. | Barcode-associated text | agentic_ocr |
Bibliography | Bibliography or references section. | Citation list | agentic_ocr |
Button | Button-like UI element. | Button label | agentic_ocr |
CalendarDate | Date text. | Date string | agentic_ocr |
Caption | Generic caption from unstructured extraction. | Caption text | agentic_ocr |
Code | Code block or code snippet. | Code content and language | fast, hi_res, agentic_ocr |
Comment | Comment or annotation text. | Comment text | agentic_ocr |
DefinitionList | Definition-style list. | Markdown definitions | agentic_ocr |
EmailAddress | Email address text. | Email value | fast, hi_res, agentic_ocr |
Figure | Charts, screenshots, diagrams, photos, or visuals. | OCR text and visual description | agentic_ocr |
FigureCaption | Caption associated with a figure. | Figure caption text | fast, hi_res, agentic_ocr |
Footer | Footer content, near the bottom of a page | Disclaimers, company addresses, version numbers | hi_res, agentic_ocr |
Footnote | Footnote content, typically near the bottom of a page. | Footnote text | fast, agentic_ocr |
FormField | Interactive form field. | Label, value, options, input type | agentic_ocr |
Formula | Mathematical expression. | Plain text formula and optional LaTeX | fast, hi_res, agentic_ocr |
Header | Recurring page-level content at the top margin. | Page number | fast, hi_res, agentic_ocr |
Image | Unstructured image element. | OCR text and visual description | fast, hi_res |
Json | JSON content. | JSON text | fast, hi_res, agentic_ocr |
KeyValue | Static labeled attribute/value pair. | Key and value | agentic_ocr |
ListItem | Single list item from extraction. | One bullet or numbered item | fast, hi_res |
Logo | Company or entity logo. | OCR text and logo description | agentic_ocr |
NarrativeText | Narrative prose from extraction. | Paragraph text | fast, hi_res |
OrderedList | Numbered list. | Markdown list with numbers | agentic_ocr |
PageBreak | Explicit page break marker. | No content | fast, hi_res |
QrCode | QR code value. | Encoded value | agentic_ocr |
Quote | Quoted text. | Quotation content | agentic_ocr |
SectionHeader | Structural heading within the body of the document. | Section title | agentic_ocr |
Signature | Signature area, signed or unsigned. | Signature text, signer metadata | agentic_ocr |
Stamp | Ink stamp, seal, or official mark. | Stamp text and description | agentic_ocr |
SubHeader | Sub-heading within a section. | Short heading text | agentic_ocr |
Table | Tabular content. | HTML, markdown of table plus summary | fast, hi_res, agentic_ocr |
TableCaption | Caption associated with a table. | Table caption text | agentic_ocr |
TableOfContents | Table of contents section. | Section listing with page numbers | agentic_ocr |
Text | General body text. | Paragraph text | agentic_ocr |
Time | Time text. | Time string | agentic_ocr |
Title | Main document title. Usually prominent and near the start of the document. | Title text | fast, hi_res, agentic_ocr |
UncategorizedText | Text that could not be classified more specifically. | Raw text | fast, hi_res |
UnorderedList | Bulleted list. | Markdown list with bullets | agentic_ocr |
Video | Video or embedded video placeholder. | A description of the video visual | agentic_ocr |
VideoSegment | Segment of video content. | Segment text and per-word timing | audio/video |
Watermark | Watermark overlay on the page. | Watermark text | agentic_ocr |
Element Types
These element-specific fields go into the data key of the response.
Address
Represents a postal address.
| Field | Type | Description |
|---|---|---|
type | "Address" | Element type |
content | string | Address text |
Example
{
"type": "Address",
"content": "123 Main Street\nSpokane Valley, WA 99206"
}AudioTranscriptionSegment
Represents a segment of transcribed audio with per-word timing metadata.
| Field | Type | Description |
|---|---|---|
type | "AudioTranscriptionSegment" | Element type |
content | string or null | Segment text |
modality_data | array | Per-word timing and probability data |
Example
{
"type": "AudioTranscriptionSegment",
"content": "Welcome to the quarterly review.",
"modality_data": [
{ "word": "Welcome", "probability": 0.99, "start": 0.0, "end": 0.42 },
{ "word": "to", "probability": 0.98, "start": 0.43, "end": 0.50 },
{ "word": "the", "probability": 0.99, "start": 0.51, "end": 0.58 },
{ "word": "quarterly", "probability": 0.97, "start": 0.59, "end": 1.10 },
{ "word": "review.", "probability": 0.98, "start": 1.11, "end": 1.55 }
]
}Author
Represents an author or byline.
| Field | Type | Description |
|---|---|---|
type | "Author" | Element type |
content | string | Author name |
Example
{
"type": "Author",
"content": "Jane Smith"
}Barcode
Represents a barcode and nearby associated text.
| Field | Type | Description |
|---|---|---|
type | "Barcode" | Element type |
content | string | Nearby associated text |
Example
{
"type": "Barcode",
"content": "Tracking ID: 1Z999AA10123456784"
}Bibliography
Represents bibliography or references content.
| Field | Type | Description |
|---|---|---|
type | "Bibliography" | Element type |
content | string | Markdown-formatted bibliography text |
Example
{
"type": "Bibliography",
"content": "1. Smith, J. *Security Systems*. 2025.\n2. Doe, A. *Infrastructure at Scale*. 2024."
}Button
Represents a button-like user interface element.
| Field | Type | Description |
|---|---|---|
type | "Button" | Element type |
content | string | Button label |
Example
{
"type": "Button",
"content": "Submit"
}CalendarDate
Represents a date string extracted from the document.
| Field | Type | Description |
|---|---|---|
type | "CalendarDate" | Element type |
content | string | Date text as it appears |
Example
{
"type": "CalendarDate",
"content": "March 1, 2026"
}Caption
| Field | Type | Description |
|---|---|---|
type | "Caption" | Element type |
content | string | Caption text |
Example
{
"type": "Caption",
"content": "Figure 2. Quarterly revenue by region."
}Code
Represents a code snippet or code block.
| Field | Type | Description |
|---|---|---|
type | "Code" | Element type |
content | string | Code content |
language | string | Detected or assigned language |
Example
{
"type": "Code",
"content": "def hello():\n return \"world\"",
"language": "python"
}Comment
Represents comment or annotation text.
| Field | Type | Description |
|---|---|---|
type | "Comment" | Element type |
content | string | Comment content |
Example
{
"type": "Comment",
"content": "Reviewer note: verify the totals against the signed copy."
}DefinitionList
Represents a definition-style list.
| Field | Type | Description |
|---|---|---|
type | "DefinitionList" | Element type |
content | string | Markdown-formatted definition list |
Example
{
"type": "DefinitionList",
"content": "Parse\n: Extract structured document elements\n\nIndex\n: Create retrievable chunks and embeddings"
}EmailAddress
Represents an email address.
| Field | Type | Description |
|---|---|---|
type | "EmailAddress" | Element type |
content | string | Email address |
Example
{
"type": "EmailAddress",
"content": "[email protected]"
}Figure
Represents a chart, screenshot, photograph, or other visual. It includes OCR text and a descriptive interpretation of the image.
| Field | Type | Description |
|---|---|---|
type | "Figure" | Element type |
content | string | OCR text visible in the figure |
description | string | Description of what the figure depicts |
base64_data | string or null | Optional post-processing image payload |
Example
{
"type": "Figure",
"content": "Revenue by Quarter\nQ1 Q2 Q3 Q4",
"description": "A bar chart comparing quarterly revenue, with steady growth from Q1 through Q4.",
"base64_data": null
}Footnote
Represents footnote content.
| Field | Type | Description |
|---|---|---|
type | "Footnote" | Element type |
content | string | Footnote text |
Example
{
"type": "Footnote",
"content": "1. Includes adjusted figures for the prior reporting period."
}FormField
Represents an interactive form field, including text inputs, checkboxes, grouped choices, date fields, and similar controls.
| Field | Type | Description |
|---|---|---|
type | "FormField" | Element type |
input_type | string | Input type such as text, checkbox, radio-group, or date |
content | string | ASCII representation of the field including label and value |
label | string | Primary field label |
value | string or null | Filled value for simple fields |
options | array or null | Available options for grouped controls |
selected_values | array or null | Selected labels for grouped controls |
help_text | string or null | Optional help text |
Example: text field
{
"type": "FormField",
"input_type": "text",
"content": "Name: Jane Smith",
"label": "Name",
"value": "Jane Smith",
"options": null,
"selected_values": null,
"help_text": null
}Example: checkbox group
{
"type": "FormField",
"input_type": "checkbox-group",
"content": "Preferred Contact: [x] Email [ ] Phone [x] SMS",
"label": "Preferred Contact",
"value": null,
"options": [
{ "label": "Email" },
{ "label": "Phone" },
{ "label": "SMS" }
],
"selected_values": ["Email", "SMS"],
"help_text": "Select all that apply."
}Formula
Represents a mathematical expression.
| Field | Type | Description |
|---|---|---|
type | "Formula" | Element type |
content | string | Plain-text visual form of the formula |
latex | string or null | LaTeX form without delimiters |
Example
{
"type": "Formula",
"content": "x = (-b ± √(b² - 4ac)) / 2a",
"latex": "x = \\frac{-b \\pm \\sqrt{b^2 - 4ac}}{2a}"
}Header
Represents recurring top-of-page metadata such as page numbers or dates.
| Field | Type | Description |
|---|---|---|
type | "Header" | Element type |
content | string | Header text |
Example
{
"type": "Header",
"content": "Confidential • Page 2"
}Image
| Field | Type | Description |
|---|---|---|
type | "Image" | Element type |
content | string | OCR text visible in the image |
description | string | Description of the image |
Example
{
"type": "Image",
"content": "Warning\nHigh Voltage",
"description": "A safety sign with a yellow triangle and a black lightning bolt symbol."
}Json
Represents JSON text extracted as its own semantic element.
| Field | Type | Description |
|---|---|---|
type | "Json" | Element type |
content | string | JSON text |
Example
{
"type": "Json",
"content": "{\"status\": \"ok\", \"count\": 3}"
}KeyValue
Represents a static labeled key/value pair.
| Field | Type | Description |
|---|---|---|
type | "KeyValue" | Element type |
key | string | Attribute label |
value | string | Value text |
Example
{
"type": "KeyValue",
"key": "Invoice #",
"value": "INV-10482"
}Logo
Represents a company or organization logo.
| Field | Type | Description |
|---|---|---|
type | "Logo" | Element type |
content | string | OCR text visible in the logo |
description | string | Description of the logo |
base64_data | string or null | Optional post-processing image payload |
Example
{
"type": "Logo",
"content": "Ragie",
"description": "A simple wordmark logo with the company name in bold sans-serif type.",
"base64_data": null
}NarrativeText
| Field | Type | Description |
|---|---|---|
type | "NarrativeText" | Element type |
content | string | Narrative text |
Example
{
"type": "NarrativeText",
"content": "The project entered a new phase following the completion of the migration."
}OrderedList
Represents a numbered list. The extracted content is formatted in markdown and includes numbering.
| Field | Type | Description |
|---|---|---|
type | "OrderedList" | Element type |
content | string | Markdown-formatted numbered list |
Example
{
"type": "OrderedList",
"content": "1. Open the dashboard\n2. Select the document\n3. Click Reprocess"
}PageBreak
| Field | Type | Description |
|---|---|---|
type | "PageBreak" | Element type |
Example
{
"type": "PageBreak"
}***## QrCode
Represents the value encoded by a QR code.
| Field | Type | Description |
|---|---|---|
type | "QrCode" | Element type |
content | string | Encoded QR code value |
Example
{
"type": "QrCode",
"content": "https://example.com/verify?id=abc123"
}Quote
Represents quoted text.
| Field | Type | Description |
|---|---|---|
type | "Quote" | Element type |
content | string | Quoted text |
Example
{
"type": "Quote",
"content": "Security is not a product, but a process."
}SectionHeader
Represents a section heading within the main body of the document.
| Field | Type | Description |
|---|---|---|
type | "SectionHeader" | Element type |
content | string | Section heading text |
Example
{
"type": "SectionHeader",
"content": "3. Risk Factors"
}Signature
Represents a signature field or signature area, whether signed or unsigned.
| Field | Type | Description |
|---|---|---|
type | "Signature" | Element type |
content | string | Best-effort textual representation of the signature area |
description | string | Description of the signature region |
label | string | Printed signature label |
is_signed | boolean | Whether a signature is present |
signer_name | string or null | Signer name if legible |
date | string or null | Signature date if present |
Example
{
"type": "Signature",
"content": "Authorized By: Jane Smith\nSigned\n03/01/2026",
"description": "A signature line containing a handwritten signature and a handwritten date.",
"label": "Authorized By",
"is_signed": true,
"signer_name": "Jane Smith",
"date": "03/01/2026"
}Stamp
Represents an official stamp, seal, or ink mark.
| Field | Type | Description |
|---|---|---|
type | "Stamp" | Element type |
content | string | Text visible in the stamp |
description | string | Description of the stamp |
Example
{
"type": "Stamp",
"content": "PAID",
"description": "A rectangular red ink stamp with the word PAID centered inside."
}SubHeader
Represents a subordinate heading within a section.
| Field | Type | Description |
|---|---|---|
type | "SubHeader" | Element type |
content | string | Sub-header text |
Example
{
"type": "SubHeader",
"content": "Regional Performance"
}Table
Represents tabular content. Table content is returned as valid HTML for the table structure.
| Field | Type | Description |
|---|---|---|
type | "Table" | Element type |
content | string | HTML <table> representation |
description | string | Brief summary of what the table shows |
header_range | string or null | Optional normalized header range |
Example
{
"type": "Table",
"content": "<table><thead><tr><th>Quarter</th><th>Revenue</th></tr></thead><tbody><tr><td>Q1</td><td>$2.1M</td></tr><tr><td>Q2</td><td>$2.5M</td></tr></tbody></table>",
"description": "A two-column revenue table showing quarterly revenue for Q1 and Q2.",
"header_range": "0-0"
}TableOfContents
Represents a table of contents.
| Field | Type | Description |
|---|---|---|
type | "TableOfContents" | Element type |
content | string | Markdown-formatted table of contents |
Example
{
"type": "TableOfContents",
"content": "1. Introduction ........ 1\n2. Findings ............ 4\n3. Appendix ............ 12"
}Text
Represents general body text that is not better classified as a more specific text type.
| Field | Type | Description |
|---|---|---|
type | "Text" | Element type |
content | string | Extracted text |
handwritten | boolean | Whether the text appears handwritten |
Example
{
"type": "Text",
"content": "Payment is due within 30 calendar days of receipt.",
"handwritten": false
}Time
Represents a time string extracted from the document.
| Field | Type | Description |
|---|---|---|
type | "Time" | Element type |
content | string | Time text as it appears |
Example
{
"type": "Time",
"content": "2:30 PM"
}Title
Represents the main title of the document.
| Field | Type | Description |
|---|---|---|
type | "Title" | Element type |
content | string | Title text |
Example
{
"type": "Title",
"content": "2025 Annual Security Review"
}UncategorizedText
| Field | Type | Description |
|---|---|---|
type | "UncategorizedText" | Element type |
content | string | Text that could not be classified more specifically |
Example
{
"type": "UncategorizedText",
"content": "Reference block A-17"
}UnorderedList
Represents a bulleted list. The extracted content is formatted in markdown and includes bullet markers.
| Field | Type | Description |
|---|---|---|
type | "UnorderedList" | Element type |
content | string | Markdown-formatted list content |
Example
{
"type": "UnorderedList",
"content": "- Review logs\n- Rotate credentials\n- Re-run validation"
}Video
Represents a video or a video placeholder.
| Field | Type | Description |
|---|---|---|
type | "Video" | Element type |
content | string | Description of the video or placeholder |
Example
{
"type": "Video",
"content": "Embedded training video thumbnail with a play button overlay."
}Media-Specific Elements
These element types are used in audio and video workflows rather than PDF parsing.
VideoSegment
Represents a segment in video workflows.
| Field | Type | Description |
|---|---|---|
type | "VideoSegment" | Element type |
content | string or null | Segment content |
Example
{
"type": "VideoSegment",
"content": "Slide changes from agenda to architecture overview."
}Watermark
Represents watermark text overlaid on the page.
| Field | Type | Description |
|---|---|---|
type | "Watermark" | Element type |
content | string | Watermark text |
Example
{
"type": "Watermark",
"content": "DRAFT"
}Updated about 7 hours ago