Read-through caching with Momento Functions

The simplest, highest-leverage pattern for Momento Functions is read-through caching: a Function takes a request, looks for the answer in Momento Cache, and only falls through to a slow upstream on a miss. Because the Function and the cache are co-located on the Momento platform, the cache lookup is quick — the only latency you pay is whatever the upstream costs on a miss.

What this replaces

Without Functions, read-through caching usually requires a server-side component: a Lambda, a sidecar, an in-process cache layer in your API service. That component holds the upstream credentials, manages the connection pool, and implements the get-or-fetch logic. It needs deployment, monitoring, and capacity planning of its own.

Functions collapses that into a single .wasm artifact deployed to a Momento cache. There is no API Gateway, no Lambda, no separate cluster — your Function is the cache layer.

The shape

A read-through cache Function will do a few familiar steps:

Compute a stable cache key from the request.
Try cache::get(&key). If it hits, return the cached body.
On miss, load from the upstream.
Write the response into the cache with a TTL, and return it. (decide if you're going to do negative caching!)

use std::time::Duration;

use momento_functions_bytes::{Data, encoding::Json};
use momento_functions_cache as cache;
use momento_functions_guest_web::{WebResponse, WebResult, invoke};
use momento_functions_http::{Request as HttpRequest, invoke as http_invoke};
use serde::Deserialize;

#[derive(Deserialize)]
struct Request { product_id: String }

invoke!(product_lookup);
fn product_lookup(Json(req): Json<Request>) -> WebResult<WebResponse> {
    let key = format!("product:{}", req.product_id);
    let ttl = Duration::from_secs(300);

    if let Some(bytes) = cache::get::<Data>(key.as_str())? {
        return Ok(WebResponse::new()
            .with_status(200)
            .header("x-cache", "hit")
            .with_body(bytes)?);
    }

    let upstream = http_invoke(
        HttpRequest::new(
            &format!("https://catalog.example.com/products/{}", req.product_id),
            "GET",
        )
        .with_header("authorization", format!("Bearer {}", std::env::var("CATALOG_KEY").unwrap_or_default())),
    )?;

    let bytes = upstream.body.into_bytes();
    cache::set(key.as_str(), bytes.clone(), ttl)?;

    Ok(WebResponse::new()
        .with_status(upstream.status as u16)
        .header("x-cache", "miss")
        .with_body(bytes)?)
}

Some considerations:

Stable key shape. The cache key is derived from the parts of the request that determine the response. Don't include caller identity unless the response is per-user.
TTL matches your tolerance for staleness. Five minutes is a fine default; tighten or loosen it to match your upstream's update rate.
Relay the upstream status. A 404 should still be a 404; consider caching it briefly to absorb burst traffic on missing IDs.

Single-flight on miss

When a popular item misses the cache, many concurrent invocations may all stampede the upstream. You can collapse them with set_if:

use momento_functions_cache::{ConditionalSetResult, SetIfCondition};

// Try to claim the work
loop {
    let claimed = cache::set_if(
        format!("{}:lock", key).as_str(),
        b"1".to_vec(),
        Duration::from_secs(5),
        SetIfCondition::Absent,
    )?;
    if matches!(claimed, ConditionalSetResult::Stored(_)) {
        break;
    }
    // Someone else is fetching; small backoff then re-read the cache
    std::thread::sleep(Duration::from_millis(50));
    if let Some(bytes) = cache::get::<Data>(key.as_str())? {
        return Ok(WebResponse::new().with_status(200).with_body(bytes)?);
    }
}
// We claimed the lock, so now fetch

This is a cheap stampede dampener, not a strict mutex. Pair it with a short lock TTL so a crashing loader doesn't block the cache key and blow up your concurrency limit.

tip

The cache-scalar example covers get/set/set_if/delete end to end. The turbopuffer-search example shows the read-through pattern applied to OpenAI embedding queries.

What this replaces​

The shape​

Single-flight on miss​

What this replaces

The shape

Single-flight on miss