Retrievals - Recency Bias

Background

In our Retrieval-Augmented Generation (RAG) system, we combine semantic (cosine similarity) and keyword (BM25) retrieval methods to generate a set of candidate chunks. Traditionally, both retrieval methods are time-agnostic. Traditionally, these two score sets are combined via Reciprocal Rank Fusion (RRF) to produce a final ranked list. However, this approach is time-agnostic, meaning it doesn’t consider when a document was created or last updated.

Recency bias introduces time-aware weighting. More recent documents are given a higher priority, ensuring that fresh content surfaces more readily.

Time Buckets

To implement recency bias, we divide documents into time-based buckets:

1 hour
24 hours
1 week
4 weeks
Everything else(older than 4 weeks)

Bucket Weights

We assign decreasing weights to each bucket, reflecting the diminishing emphasis on older documents:

1 hour: 1.0
24 hours: 0.9
1 week: 0.8
4 weeks: 0.7
Everything else: 0.6

These weights reflect that documents within the last hour are considered most valuable for recency, while older documents, although potentially still relevant, receive slightly lower emphasis.

Step 1: Initial Retrieval and Weighted Hybrid Scores

For each bucket, we run both semantic (cosine similarity) and keyword (BM25) retrieval.

Semantic Query (Cosine Similarity): Retrieves top_k chunks based on conceptual similarity.
Keyword Query (BM25): Retrieves top_k chunks based on keyword matching.

We first normalize these scores separately within each bucket. After normalization, we merge them into a single hybrid score where both semantic and keyword are given an equal weight of 0.5

HybridScore = 0.5*CosineSimilarity_Normalized + 0.5*BM25_Normalized

This ensures that both relevance signals contribute equally, and all scores are initially on a comparable scale within their bucket. You now have one set of hybrid scores per time bucket.

Step 2: Normalization Within Each Bucket

After computing the weighted hybrid scores, each bucket's scores are again normalized using min-max scaling to ensure that scores can be compared across time buckets.

Normalization (min-max scaling) transforms scores into the [0, 1] range:

Normalized_Score = (Hybrid_Score - min(Hybrid_Scores)) / (max(Hybrid_Scores) - min(Hybrid_Scores))

If all scores in a bucket are equal, we assign them all a normalized score of 1.0.

This normalization ensures that a top chunk in any bucket gets a score near 1.0, and a bottom chunk a score near 0.0, making each bucket's scores directly comparable across buckets.

Why Normalize?
Buckets can vary widely in their raw score ranges. Normalizing ensures that a highly relevant chunk in an older bucket can still be selected over a less relevant chunk in a more recent time bucket. This ensure balance between recency and relevancy.

Step 3: Apply Recency Weights

We then multiply the normalized scores by the bucket’s recency weight:

Weighted_Normalized_Score = Normalized_Score * w_b

where w_b is the weight for the bucket. For example, if the document is from the “24 hours” bucket and the normalized score was 0.9, and w_b = 0.9 :

Weighted_Normalized_Score = 0.9 × 0.9 = 0.81

This step ensures that recent buckets have a greater influence on the final ranking while still allowing highly relevant older documents to maintain competitive scores if they have high normalized relevance.

Step 4: Aggregation Across Buckets (Using Maximum Score)

Once we have computed the weighted normalized scores within each time bucket, we then select the top_k chunks across all buckets based on their scores. If a chunk appears in multiple buckets, we use the maximum weighted normalized score for that chunk across all buckets:

Final_Score(chunk) = max (Weighted_Normalized_Score_b(chunk)) over all buckets b

By taking the maximum, we ensure that a chunk is represented by its strongest performance in any given time window, without artificially boosting it for appearing multiple times..

Example

Consider a chunk C1 that appears only in the 1-week bucket. Suppose it has a Hybrid_Score that normalizes to 0.95 in that bucket. The 1-week bucket weight is 0.8, so:

Weighted_Normalized_Score = 0.95 * 0.8 = 0.76

Another chunk C2 appears in the 1-hour and 24-hour buckets. Suppose in the 1-hour bucket it normalizes to 0.7 (weighted = 0.7 * 1.0 = 0.7) and in the 24-hour bucket it normalizes to 0.9 (weighted = 0.9 * 0.9 = 0.81). Taking the maximum for C2:

max(0.7, 0.81) = 0.81

Here, C2 edges out C1 due to recency even though C1 has a higher hybrid score.

Recency bias can be applied across all of Ragie’s retrieval methods—whether you are using hybrid retrieval, hierarchical retrieval, or including a reranking step. This feature is flexible and can be enabled or disabled at per request through our retrievals API, allowing you to tailor the retrieval results to meet the needs of your application.