Partitions
In Ragie documents may be logically separated into partitions. Retrievals can provide an optional partition
parameter, that, when present, scopes the retrieval to documents in the given partition. Partitions can be used for a number of use cases, such as segregating user data in multi-tenant saas applications or defining distinct knowledge bases for use in different contexts. Partitions are an optional feature and, if not used, all documents will exist in a “default” partition for your tenant.
Partitions are provided as a string that is lowercase alphanumeric and may include the _
and -
special characters.
Partitions may also be used to improve retrieval results. Ragie uses a hybrid approach when performing retrievals that includes searching a keyword index. These keywords indexes are also separated by partitions. A relevant detail here is that keyword importance (or weight) is partially determined by how frequently the keyword appears in the set of documents (inverse document frequency). For example legal jargon in a set of law documents would be relatively less important compared to legal jargon appearing in customer service documents, since presumably the legal terms would appear far more frequently in the law documents. Partitions are a useful construct for separating documents by domain to improve the quality of the keyword portion of Ragie’s hybrid search.
Working with Partitions
Creating partitions
A Partition is automatically created anytime a document is created in it.
Creating document
Documents can be created in a partition by providing an optional partition string when creating them.
Retrievals
The optional partition
parameter can be provided when doing retrievals. When present the retrieval will be scoped to that Partition. Fine grained scoping via metadata filters may be combined with partition scoping.
Partition specific Entity Extraction Instructions
Entity Extraction Instructions can be created with a partition
parameter. When present the Instruction will only be run on documents in the given partition. If the partition
parameter is omitted the instruction will be run on all documents.
Common use cases
Multi-tenant SaaS
Multi-tenant applications will generally want to isolate their users’ data to prevent data leakage between users. Apps may want to use a USER_ID as their partition key or potentially an ORG_ID if the app is more multiuser in nature. Isolating user and organization data is as simple as providing the desired partition key when managing documents and doing retrievals. More fine-grained retrieval scoping via metadata filters is still possible and can be combined with partitions.
Isolated Knowledge Bases
If an organization has multiple distinct domains of knowledge that they want to use as sources for their generative AI applications, creating partitions for those domains will improve the quality of the keyword component of Ragie’s hybrid search approach. Depending on the use case, creating a distinct partition for various functions such as customer support, legal, HR, etc… may be the ideal approach. This is not always a one size fits all recommendation and may affect how you structure your retrievals. If you have any questions, we’re always happy to discuss the particulars of your use case and help you design the best approach.
Updated about 2 months ago