The Elements API allows you to get the elements extracted from documents. For example, if you upload a PDF with a lot of Tables, you might want to get the content just for those - skipping the headers, images, and so on.

📘
Some of the elements are only available in agentic mode - currently in beta. Want to try it out? Contact us at [email protected]

Structure

An Element looks like this

{
      "id": "9d0799a2-d891-4c4f-ba84-e507b631023d",
      "created_at": "2026-02-14T00:41:50.150050Z",
      "index": 6,
      "metadata": {
        "languages": [
          "eng"
        ],
        "filetype": "application/vnd.openxmlformats-officedocument.wordprocessingml.document"
      },
      "type": "Table",
      "text": "<table>\n <thead>\n  <tr>\n   <th>\n    animal...",
      "markdown": "\n\n| animal | behavior | friendliness | utility |...",
      "location": null,
      "data": {
        "type": "Table",
        "content": "<table>....",
        "description": "The table consists of information about different animals and their characteristics...",
        "header_range": null
      }
    }

Here are the general properties, see Element Types & Data Content for more details

Property	Description	Type
id	unique id of the element	string
created_at	when the element was created	string
index	The relative location of the element in the document when viewed top to bottom. A smaller number means earlier, 0 being the first.	number
metadata	Additional information about the element
type	The element type
text	Textual representation of the element
markdown	The markdown representation of the element, if appropriate.
location	Where in the document the element was found. This is context sensitive.
data_content	Element specific data

Location

This helps identify where an element is location relative to the document.

Bounding box

Locations in PDFs are bounding boxes of where the element was found in the page.

{
	"location_type": "bounding_box",
	"left": 0.0965,
	"top": 0.048,
	"width": 0.154,
	"height": 0.0339,
	"page_number": 1
}

Left, top, width, and height are all Normalized values. So 0 for left means the left-most, and 1 means the right most part of the page.

Character index

Inside a markdown file, locations are indicated by character ranges.

{
	"location_type": "character_index",
	"start_char_index": 23,
	"end_char_index": 1037
}

Spreadsheet

Locations inside a spreadsheet tell you what cell or cell range elements were found in.

{
	"location_type": "spreadsheet",
	"range": "B3:B3",
	"sheet_name": "Sheet1",
	"sheet_index": 0
}

Duration

Inside audio and video segments, you find duration locations. These tell you where in the file the content was found.

{
	"location_type": "duration",
	"start_time": 0,
	"end_time": 9.96,
	"duration": 9.96
}

Document Elements

The following element types may appear when parsing a document.

📘
Some element types are specialized and only appear for certain types. For example, AudioTranscriptionSegment and VideoSegment are audio/video-specific and are not detected in PDFs.

Element Type	Description	Typical Content	Supported Mode
`Address`	Postal address text.	Mailing address	fast, hi_res, agentic_ocr
`AudioTranscriptionSegment`	Segment of transcribed audio.	Segment text and per-word timing	audio/video
`Author`	Author name or byline.	Person or organization name	agentic_ocr
`Barcode`	Barcode and any nearby associated text.	Barcode-associated text	agentic_ocr
`Bibliography`	Bibliography or references section.	Citation list	agentic_ocr
`Button`	Button-like UI element.	Button label	agentic_ocr
`CalendarDate`	Date text.	Date string	agentic_ocr
`Caption`	Generic caption from unstructured extraction.	Caption text	agentic_ocr
`Code`	Code block or code snippet.	Code content and language	fast, hi_res, agentic_ocr
`Comment`	Comment or annotation text.	Comment text	agentic_ocr
`DefinitionList`	Definition-style list.	Markdown definitions	agentic_ocr
`EmailAddress`	Email address text.	Email value	fast, hi_res, agentic_ocr
`Figure`	Charts, screenshots, diagrams, photos, or visuals.	OCR text and visual description	agentic_ocr
`FigureCaption`	Caption associated with a figure.	Figure caption text	fast, hi_res, agentic_ocr
`Footer`	Footer content, near the bottom of a page	Disclaimers, company addresses, version numbers	hi_res, agentic_ocr
`Footnote`	Footnote content, typically near the bottom of a page.	Footnote text	fast, agentic_ocr
`FormField`	Interactive form field.	Label, value, options, input type	agentic_ocr
`Formula`	Mathematical expression.	Plain text formula and optional LaTeX	fast, hi_res, agentic_ocr
`Header`	Recurring page-level content at the top margin.	Page number	fast, hi_res, agentic_ocr
`Image`	Unstructured image element.	OCR text and visual description	fast, hi_res
`Json`	JSON content.	JSON text	fast, hi_res, agentic_ocr
`KeyValue`	Static labeled attribute/value pair.	Key and value	agentic_ocr
`ListItem`	Single list item from extraction.	One bullet or numbered item	fast, hi_res
`Logo`	Company or entity logo.	OCR text and logo description	agentic_ocr
`NarrativeText`	Narrative prose from extraction.	Paragraph text	fast, hi_res
`OrderedList`	Numbered list.	Markdown list with numbers	agentic_ocr
`PageBreak`	Explicit page break marker.	No content	fast, hi_res
`QrCode`	QR code value.	Encoded value	agentic_ocr
`Quote`	Quoted text.	Quotation content	agentic_ocr
`SectionHeader`	Structural heading within the body of the document.	Section title	agentic_ocr
`Signature`	Signature area, signed or unsigned.	Signature text, signer metadata	agentic_ocr
`Stamp`	Ink stamp, seal, or official mark.	Stamp text and description	agentic_ocr
`SubHeader`	Sub-heading within a section.	Short heading text	agentic_ocr
`Table`	Tabular content.	HTML, markdown of table plus summary	fast, hi_res, agentic_ocr
`TableCaption`	Caption associated with a table.	Table caption text	agentic_ocr
`TableOfContents`	Table of contents section.	Section listing with page numbers	agentic_ocr
`Text`	General body text.	Paragraph text	agentic_ocr
`Time`	Time text.	Time string	agentic_ocr
`Title`	Main document title. Usually prominent and near the start of the document.	Title text	fast, hi_res, agentic_ocr
`UncategorizedText`	Text that could not be classified more specifically.	Raw text	fast, hi_res
`UnorderedList`	Bulleted list.	Markdown list with bullets	agentic_ocr
`Video`	Video or embedded video placeholder.	A description of the video visual	agentic_ocr
`VideoSegment`	Segment of video content.	Segment text and per-word timing	audio/video
`Watermark`	Watermark overlay on the page.	Watermark text	agentic_ocr

Element Types

These element-specific fields go into the data key of the response.

Address

Represents a postal address.

Field	Type	Description
`type`	`"Address"`	Element type
`content`	string	Address text

Example

{
  "type": "Address",
  "content": "123 Main Street\nSpokane Valley, WA 99206"
}

AudioTranscriptionSegment

Represents a segment of transcribed audio with per-word timing metadata.

Field	Type	Description
`type`	`"AudioTranscriptionSegment"`	Element type
`content`	string or null	Segment text
`modality_data`	array	Per-word timing and probability data

Example

{
  "type": "AudioTranscriptionSegment",
  "content": "Welcome to the quarterly review.",
  "modality_data": [
    { "word": "Welcome", "probability": 0.99, "start": 0.0, "end": 0.42 },
    { "word": "to", "probability": 0.98, "start": 0.43, "end": 0.50 },
    { "word": "the", "probability": 0.99, "start": 0.51, "end": 0.58 },
    { "word": "quarterly", "probability": 0.97, "start": 0.59, "end": 1.10 },
    { "word": "review.", "probability": 0.98, "start": 1.11, "end": 1.55 }
  ]
}

Author

Represents an author or byline.

Field	Type	Description
`type`	`"Author"`	Element type
`content`	string	Author name

Example

{
  "type": "Author",
  "content": "Jane Smith"
}

Barcode

Represents a barcode and nearby associated text.

Field	Type	Description
`type`	`"Barcode"`	Element type
`content`	string	Nearby associated text

Example

{
  "type": "Barcode",
  "content": "Tracking ID: 1Z999AA10123456784"
}

Bibliography

Represents bibliography or references content.

Field	Type	Description
`type`	`"Bibliography"`	Element type
`content`	string	Markdown-formatted bibliography text

Example

{
  "type": "Bibliography",
  "content": "1. Smith, J. *Security Systems*. 2025.\n2. Doe, A. *Infrastructure at Scale*. 2024."
}

Button

Represents a button-like user interface element.

Field	Type	Description
`type`	`"Button"`	Element type
`content`	string	Button label

Example

{
  "type": "Button",
  "content": "Submit"
}

CalendarDate

Represents a date string extracted from the document.

Field	Type	Description
`type`	`"CalendarDate"`	Element type
`content`	string	Date text as it appears

Example

{
  "type": "CalendarDate",
  "content": "March 1, 2026"
}

Caption

Field	Type	Description
`type`	`"Caption"`	Element type
`content`	string	Caption text

Example

{
  "type": "Caption",
  "content": "Figure 2. Quarterly revenue by region."
}

Code

Represents a code snippet or code block.

Field	Type	Description
`type`	`"Code"`	Element type
`content`	string	Code content
`language`	string	Detected or assigned language

Example

{
  "type": "Code",
  "content": "def hello():\n    return \"world\"",
  "language": "python"
}

Comment

Represents comment or annotation text.

Field	Type	Description
`type`	`"Comment"`	Element type
`content`	string	Comment content

Example

{
  "type": "Comment",
  "content": "Reviewer note: verify the totals against the signed copy."
}

DefinitionList

Represents a definition-style list.

Field	Type	Description
`type`	`"DefinitionList"`	Element type
`content`	string	Markdown-formatted definition list

Example

{
  "type": "DefinitionList",
  "content": "Parse\n: Extract structured document elements\n\nIndex\n: Create retrievable chunks and embeddings"
}

EmailAddress

Represents an email address.

Field	Type	Description
`type`	`"EmailAddress"`	Element type
`content`	string	Email address

Example

{
  "type": "EmailAddress",
  "content": "[email protected]"
}

Figure

Represents a chart, screenshot, photograph, or other visual. It includes OCR text and a descriptive interpretation of the image.

Field	Type	Description
`type`	`"Figure"`	Element type
`content`	string	OCR text visible in the figure
`description`	string	Description of what the figure depicts
`base64_data`	string or null	Optional post-processing image payload

Example

{
  "type": "Figure",
  "content": "Revenue by Quarter\nQ1 Q2 Q3 Q4",
  "description": "A bar chart comparing quarterly revenue, with steady growth from Q1 through Q4.",
  "base64_data": null
}

Footnote

Represents footnote content.

Field	Type	Description
`type`	`"Footnote"`	Element type
`content`	string	Footnote text

Example

{
  "type": "Footnote",
  "content": "1. Includes adjusted figures for the prior reporting period."
}

FormField

Represents an interactive form field, including text inputs, checkboxes, grouped choices, date fields, and similar controls.

Field	Type	Description
`type`	`"FormField"`	Element type
`input_type`	string	Input type such as `text`, `checkbox`, `radio-group`, or `date`
`content`	string	ASCII representation of the field including label and value
`label`	string	Primary field label
`value`	string or null	Filled value for simple fields
`options`	array or null	Available options for grouped controls
`selected_values`	array or null	Selected labels for grouped controls
`help_text`	string or null	Optional help text

Example: text field

{
  "type": "FormField",
  "input_type": "text",
  "content": "Name: Jane Smith",
  "label": "Name",
  "value": "Jane Smith",
  "options": null,
  "selected_values": null,
  "help_text": null
}

Example: checkbox group

{
  "type": "FormField",
  "input_type": "checkbox-group",
  "content": "Preferred Contact: [x] Email [ ] Phone [x] SMS",
  "label": "Preferred Contact",
  "value": null,
  "options": [
    { "label": "Email" },
    { "label": "Phone" },
    { "label": "SMS" }
  ],
  "selected_values": ["Email", "SMS"],
  "help_text": "Select all that apply."
}

Formula

Represents a mathematical expression.

Field	Type	Description
`type`	`"Formula"`	Element type
`content`	string	Plain-text visual form of the formula
`latex`	string or null	LaTeX form without delimiters

Example

{
  "type": "Formula",
  "content": "x = (-b ± √(b² - 4ac)) / 2a",
  "latex": "x = \\frac{-b \\pm \\sqrt{b^2 - 4ac}}{2a}"
}

Header

Represents recurring top-of-page metadata such as page numbers or dates.

Field	Type	Description
`type`	`"Header"`	Element type
`content`	string	Header text

Example

{
  "type": "Header",
  "content": "Confidential • Page 2"
}

Image

Field	Type	Description
`type`	`"Image"`	Element type
`content`	string	OCR text visible in the image
`description`	string	Description of the image

Example

{
  "type": "Image",
  "content": "Warning\nHigh Voltage",
  "description": "A safety sign with a yellow triangle and a black lightning bolt symbol."
}

Json

Represents JSON text extracted as its own semantic element.

Field	Type	Description
`type`	`"Json"`	Element type
`content`	string	JSON text

Example

{
  "type": "Json",
  "content": "{\"status\": \"ok\", \"count\": 3}"
}

KeyValue

Represents a static labeled key/value pair.

Field	Type	Description
`type`	`"KeyValue"`	Element type
`key`	string	Attribute label
`value`	string	Value text

Example

{
  "type": "KeyValue",
  "key": "Invoice #",
  "value": "INV-10482"
}

Logo

Represents a company or organization logo.

Field	Type	Description
`type`	`"Logo"`	Element type
`content`	string	OCR text visible in the logo
`description`	string	Description of the logo
`base64_data`	string or null	Optional post-processing image payload

Example

{
  "type": "Logo",
  "content": "Ragie",
  "description": "A simple wordmark logo with the company name in bold sans-serif type.",
  "base64_data": null
}

NarrativeText

Field	Type	Description
`type`	`"NarrativeText"`	Element type
`content`	string	Narrative text

Example

{
  "type": "NarrativeText",
  "content": "The project entered a new phase following the completion of the migration."
}

OrderedList

Represents a numbered list. The extracted content is formatted in markdown and includes numbering.

Field	Type	Description
`type`	`"OrderedList"`	Element type
`content`	string	Markdown-formatted numbered list

Example

{
  "type": "OrderedList",
  "content": "1. Open the dashboard\n2. Select the document\n3. Click Reprocess"
}

PageBreak

Field	Type	Description
`type`	`"PageBreak"`	Element type

Example

{
  "type": "PageBreak"
}

***## QrCode

Represents the value encoded by a QR code.

Field	Type	Description
`type`	`"QrCode"`	Element type
`content`	string	Encoded QR code value

Example

{
  "type": "QrCode",
  "content": "https://example.com/verify?id=abc123"
}

Quote

Represents quoted text.

Field	Type	Description
`type`	`"Quote"`	Element type
`content`	string	Quoted text

Example

{
  "type": "Quote",
  "content": "Security is not a product, but a process."
}

SectionHeader

Represents a section heading within the main body of the document.

Field	Type	Description
`type`	`"SectionHeader"`	Element type
`content`	string	Section heading text

Example

{
  "type": "SectionHeader",
  "content": "3. Risk Factors"
}

Signature

Represents a signature field or signature area, whether signed or unsigned.

Field	Type	Description
`type`	`"Signature"`	Element type
`content`	string	Best-effort textual representation of the signature area
`description`	string	Description of the signature region
`label`	string	Printed signature label
`is_signed`	boolean	Whether a signature is present
`signer_name`	string or null	Signer name if legible
`date`	string or null	Signature date if present

Example

{
  "type": "Signature",
  "content": "Authorized By: Jane Smith\nSigned\n03/01/2026",
  "description": "A signature line containing a handwritten signature and a handwritten date.",
  "label": "Authorized By",
  "is_signed": true,
  "signer_name": "Jane Smith",
  "date": "03/01/2026"
}

Stamp

Represents an official stamp, seal, or ink mark.

Field	Type	Description
`type`	`"Stamp"`	Element type
`content`	string	Text visible in the stamp
`description`	string	Description of the stamp

Example

{
  "type": "Stamp",
  "content": "PAID",
  "description": "A rectangular red ink stamp with the word PAID centered inside."
}

SubHeader

Represents a subordinate heading within a section.

Field	Type	Description
`type`	`"SubHeader"`	Element type
`content`	string	Sub-header text

Example

{
  "type": "SubHeader",
  "content": "Regional Performance"
}

Table

Represents tabular content. Table content is returned as valid HTML for the table structure.

Field	Type	Description
`type`	`"Table"`	Element type
`content`	string	HTML `<table>` representation
`description`	string	Brief summary of what the table shows
`header_range`	string or null	Optional normalized header range

Example

{
  "type": "Table",
  "content": "<table><thead><tr><th>Quarter</th><th>Revenue</th></tr></thead><tbody><tr><td>Q1</td><td>$2.1M</td></tr><tr><td>Q2</td><td>$2.5M</td></tr></tbody></table>",
  "description": "A two-column revenue table showing quarterly revenue for Q1 and Q2.",
  "header_range": "0-0"
}

Field	Type	Description
`type`	`"TableOfContents"`	Element type
`content`	string	Markdown-formatted table of contents

Text

Represents general body text that is not better classified as a more specific text type.

Field	Type	Description
`type`	`"Text"`	Element type
`content`	string	Extracted text
`handwritten`	boolean	Whether the text appears handwritten

Example

{
  "type": "Text",
  "content": "Payment is due within 30 calendar days of receipt.",
  "handwritten": false
}

Time

Represents a time string extracted from the document.

Field	Type	Description
`type`	`"Time"`	Element type
`content`	string	Time text as it appears

Example

{
  "type": "Time",
  "content": "2:30 PM"
}

Title

Represents the main title of the document.

Field	Type	Description
`type`	`"Title"`	Element type
`content`	string	Title text

Example

{
  "type": "Title",
  "content": "2025 Annual Security Review"
}

UncategorizedText

Field	Type	Description
`type`	`"UncategorizedText"`	Element type
`content`	string	Text that could not be classified more specifically

Example

{
  "type": "UncategorizedText",
  "content": "Reference block A-17"
}

UnorderedList

Represents a bulleted list. The extracted content is formatted in markdown and includes bullet markers.

Field	Type	Description
`type`	`"UnorderedList"`	Element type
`content`	string	Markdown-formatted list content

Example

{
  "type": "UnorderedList",
  "content": "- Review logs\n- Rotate credentials\n- Re-run validation"
}

Video

Represents a video or a video placeholder.

Field	Type	Description
`type`	`"Video"`	Element type
`content`	string	Description of the video or placeholder

Example

{
  "type": "Video",
  "content": "Embedded training video thumbnail with a play button overlay."
}

Media-Specific Elements

These element types are used in audio and video workflows rather than PDF parsing.

VideoSegment

Represents a segment in video workflows.

Field	Type	Description
`type`	`"VideoSegment"`	Element type
`content`	string or null	Segment content

Example

{
  "type": "VideoSegment",
  "content": "Slide changes from agenda to architecture overview."
}

Watermark

Represents watermark text overlaid on the page.

Field	Type	Description
`type`	`"Watermark"`	Element type
`content`	string	Watermark text

Example

{
  "type": "Watermark",
  "content": "DRAFT"
}

Structure

Location

Bounding box

Character index

Spreadsheet

Duration

Document Elements

Element Types

Address

Example

AudioTranscriptionSegment

Example

Author

Example

Barcode

Example

Bibliography

Example

Button

Example

CalendarDate

Example

Caption

Example

Code

Example

Comment

Example

DefinitionList

Example

EmailAddress

Example

Figure

Example

Footnote

Example

FormField

Example: text field

Example: checkbox group

Formula

Example

Header

Example

Image

Example

Json

Example

KeyValue

Example

Logo

Example

NarrativeText

Example

OrderedList

Example

PageBreak

Example

Example

Quote

Example

SectionHeader

Example

Signature

Example

Stamp

Example

SubHeader

Example

Table

Example

TableOfContents

Example

Text

Example

Time

Example

Title

Example

UncategorizedText

Example