Using Core API Effectively: Parallelism, Patient Records & Caching – Help Centre

Audience: Partners integrating with the Black Pear Core API Applies to: FHIR patient-record retrieval backed by clinical systems (e.g. SystmOne / TPP)

Summary

Our client-facing APIs are built on a modern, horizontally-scalable framework and are designed to be called in parallel. The single most useful thing to understand is where parallelism helps:

Across different endpoints — parallel is encouraged. Each endpoint (e.g. /A12345, /X00001) is a separate clinical-system instance with its own connection. Requests to different endpoints run genuinely concurrently and don't contend with each other. Spread your traffic across endpoints freely.
Within a single endpoint — requests are processed one at a time. A given endpoint's upstream clinical system (SystmOne) handles one request at a time over a single connection, regardless of which patient the request is for. So within one endpoint, pace your requests and fetch one record at a time rather than firing a burst.
Anything we can serve from our cache — parallel is free. Once a patient's record has been fetched once, its sub-resources (allergies, observations, conditions, medications) are served from our cache and never touch the clinical system, so you can read them concurrently.

The failure mode to avoid is repeating or stacking requests for the same not-yet-cached record on a single endpoint — they queue behind one another upstream and hit timeouts. This guide shows you how to get the best of all three.

How the platform handles a request

When you request a patient record, the request travels through three layers:

Core API (our layer). Modern, scalable, handles concurrency well. Applies your usage plan (rate limiting).
Clinical System Adapter. Routes your request to the correct endpoint and translates it into the upstream clinical system's protocol.
The upstream clinical system (e.g. SystmOne), per endpoint. Assembles and returns the record.

Each endpoint has its own upstream connection. That's why requests to /A12345 and /X00001 proceed independently and in parallel — they're talking to two separate clinical-system instances.

The constraint lives at layer 3, within a single endpoint. That endpoint's SystmOne connection processes one request at a time, whichever patient it concerns. A large record, or a system already under load, can take up to ~30 seconds to return. If further requests for that same endpoint arrive while one is still being processed, they queue behind it. If the queue isn't cleared before the upstream timeout (30,000ms), the queued requests are abandoned and time out.

We cache the assembled record — use it

Once an endpoint returns a patient record successfully, we cache it at our layer (not the clinical system's) for approximately 10 minutes. During that window, subsequent reads of that patient's resources — allergies, observations, conditions, medications, and so on — are served from our cache, which is fast and does not go back upstream. After the cache expires, the next request for that patient will trigger a fresh upstream fetch (and re-cache the result), so it pays to read everything you need for a patient within that window rather than spreading requests out over a long period.

The single most important implication:

Fetch a patient's record once. Wait for it to complete. Then read everything else you need for that patient — it's already cached.

Once that first fetch succeeds, pull everything else you need for the patient straight away as it is served from cache.

What "good" and "bad" look like

✅ Do this — parallelise across endpoints; one record at a time within an endpoint

# Different endpoints → genuinely parallel (separate clinical systems)
GET /A12345/Patient/{patient_1}?_include=*     ┐ run together
GET /X00001/Patient/{patient_3}?_include=*     ┘

# Same endpoint → sequential: let each record finish before the next
GET /A12345/Patient/{patient_1}?_include=*     → wait for completion
GET /A12345/Patient/{patient_2}?_include=*     → then the next patient

Then, for any patient whose record is already cached, fan out concurrently:

GET /A12345/Patient/{patient_1}/AllergyIntolerance   ┐
GET /A12345/Patient/{patient_1}/Observation          │ served from cache,
GET /A12345/Patient/{patient_1}/Condition            │ safe in parallel
GET /A12345/Patient/{patient_1}/MedicationRequest    ┘

You pay the upstream cost once per patient, then everything else is fast.

❌ Don't do this — stack requests for the same record on one endpoint

GET /A12345/Patient/{patient_1}?_include=*
GET /A12345/Patient/{patient_1}?_include=*    ← queues behind the first, times out
GET /A12345/Patient/{patient_1}?_include=*    ← queues, times out
…repeat until "successful"…

Each retry lands behind the request already in flight on that endpoint. They cannot run in parallel within the endpoint, so they sit in the queue until the 30s timeout, get abandoned, and you retry again — keeping the queue permanently full. This pattern is slower, not faster, and consumes your usage plan for no benefit.

When parallelism is fine — and when it isn't

Scenario	Parallel?	Why
Across different endpoints (`/A12345` vs `/X00001`)	✅ Yes	Separate clinical-system instances, separate connections — no contention.
Sub-resources of an already-cached patient	✅ Yes	Served from our cache; never reaches the clinical system.
Different patients on the same endpoint, not yet cached	⚠️ No benefit	One connection per endpoint, processed one request at a time. Fetch one record at a time.
The same not-yet-cached record on one endpoint	❌ No	Repeats queue behind the in-flight request and time out.

Practical rule: parallelise across endpoints and across cached reads. Within a single endpoint, fetch uncached records one at a time and let each complete. This isn't a limit on our API — our layer would happily take the concurrency — it's each endpoint's upstream clinical system that serialises, so pacing requests to a single endpoint is what keeps you fast and timeout-free.

Recommended request pattern

Resolve the patient first. Do a single search (e.g. by NHS number) to get the patient ID. One search is enough — repeating the same search adds load without adding information.
Fetch the full record once and wait for it to return before issuing anything else for that patient on that endpoint. Allow up to ~30s; this is the upstream assembling the record, not a fault.
Then read the resources you need. These come from our cache and are fast — and safe to request concurrently. The record stays cached for ~10 minutes, so read everything you need for that patient within that window.
Don't retry into a timeout. If a full-record request is slow, a second identical request will not overtake it — it will queue behind it on the same endpoint. Wait for the first to resolve.
Back off on failure. If a request does time out, wait before retrying (exponential backoff), and don't fire multiple retries at once.
Spread work across endpoints, pace it within one. Requests to different endpoints run in parallel, so you can process multiple endpoints concurrently. Within a single endpoint, fetch uncached records one at a time.
Stay within your usage plan. Plans are expressed as requests/second. Bursting above your limit means requests are held in our throttle queue before they even reach the clinical system — adding latency on top of everything above.

Why the timeouts, and why this is the right pattern

To be transparent about the architecture: our client-facing APIs are deliberately modern and built to scale. The latency and serialisation you encounter on full-record retrieval come from the upstream clinical-system interface, which is older and, within each endpoint, processes requests through a single connection one at a time. That's why concurrency helps across endpoints and across cached reads, but not when stacking uncached requests onto one endpoint. We are actively working with the upstream provider for that interface to catch up, and we'll update partners as that capability improves.

In the meantime, the pattern above isn't a workaround — it's the efficient way to use the platform as designed. Fetching once, pacing your per-endpoint reads, and then reading from cache is faster for you, lighter on the clinical system, and easier on your usage plan. Everybody wins.

Quick reference

Situation	Safe to parallelise?	Notes
Different endpoints (`/A12345`, `/X00001`)	✅ Yes	Separate clinical systems — run concurrently, within your usage plan.
Sub-resources of a patient whose record is already cached	✅ Yes	Served from our cache, never touches the clinical system.
Different patients on the same endpoint, not yet cached	⚠️ No benefit	One connection per endpoint, one request at a time. Fetch one record at a time.
Same patient/record on one endpoint, not yet cached	❌ No	Fetch once, wait. Repeats queue and time out.
Repeating an identical search to "speed things up"	❌ No	Adds load, no benefit. One search is enough.
Retrying immediately into a timeout	❌ No	Back off and wait; the in-flight request will return.

Questions about your integration or usage plan? Contact your Black Pear technical contact.