Custom Software

What is Bulk FHIR? (Bulk FHIR Data Export)

Standard FHIR APIs are designed for individual patient queries — pull one patient’s medications, retrieve one patient’s lab results. But what happens when you need data for 50,000 patients at once? When a population health team needs to analyze an entire patient panel, when a payer needs claims data for all attributed members, or when an organization needs to extract everything for a system migration? That’s where Bulk FHIR comes in — the specification built for population-scale data extraction.

Calculate My Project Cost Connect With Experts

Tell Us Your Requirements

Our experts are ready to understand your business goals.

Trusted Partners

Trusted by Industry Leaders Worldwide

Recognition

Awards & Recognitions

Definition of Bulk FHIR

Bulk FHIR, formally called the FHIR Bulk Data Access specification (also known as the Flat FHIR or Backend Services specification), is a FHIR implementation guide that defines how to export large volumes of health data from a FHIR server asynchronously. Instead of making thousands of individual API calls to retrieve patient-by-patient data, Bulk FHIR provides a single export operation that returns population-level datasets in NDJSON (Newline Delimited JSON) format.

Bulk FHIR was developed by the SMART Health IT team at Boston Children’s Hospital (the same team behind SMART on FHIR) and is published as an HL7 FHIR Implementation Guide. It has been adopted by ONC as part of the Health IT Certification Program — certified EHR systems must support Bulk FHIR export to satisfy the 21st Century Cures Act electronic health information (EHI) export requirement.

The specification addresses a fundamental gap in the original FHIR standard: FHIR’s RESTful API model works well for transactional, patient-level data access but is inefficient for bulk data operations. Retrieving data for an entire patient population one patient at a time would generate millions of HTTP requests, overwhelm the FHIR server, and take hours or days to complete. Bulk FHIR replaces that with an asynchronous workflow designed for large-scale extraction.

In simple terms: Bulk FHIR is the firehose version of the FHIR API — designed for extracting data on thousands or millions of patients at once, instead of one at a time.

How Bulk FHIR Works in Healthcare

Bulk FHIR operates through an asynchronous export workflow with three phases: request, processing, and retrieval.

Export request. The client application initiates a bulk export by sending a GET request to one of three export endpoints:

/Patient/$export — Exports all data for all patients accessible to the requesting system. This is the most common endpoint for population-level data extraction.

/Group/{id}/$export — Exports data for a specific group of patients — an attributed panel, a care management cohort, or a research population. The Group resource defines which patients are included.

/$export — Exports all data on the FHIR server, including non-patient resources (Practitioner, Organization, Location). Used for system-level data migration or comprehensive backup.

The request includes parameters specifying which resource types to export (Patient, Condition, Observation, MedicationRequest, etc.), the output format (NDJSON), and optionally a _since parameter to export only data changed since a specific date — enabling incremental extraction.

Asynchronous processing. The server accepts the request and returns an HTTP 202 (Accepted) response with a Content-Location URL for status polling. The server processes the export in the background — querying its data store, serializing FHIR resources into NDJSON files, and staging the output for download. The client polls the status URL until the server returns HTTP 200 (Complete) with a manifest listing the download URLs for each resource type.

Data retrieval. The client downloads the NDJSON files from the URLs provided in the manifest. Each file contains one FHIR resource per line — one Patient resource per line in the Patient file, one Observation per line in the Observation file, and so on. NDJSON format is optimized for streaming processing — each line is a complete, valid JSON object that can be parsed independently.

Authentication. Bulk FHIR uses the SMART Backend Services authorization specification — a system-to-system authentication flow using asymmetric keys (JSON Web Tokens signed with a private key) rather than user-facing OAuth flows. This makes sense for bulk operations that run without a human user in the loop — scheduled nightly exports, automated analytics pipelines, and system integration jobs.

Key Bulk FHIR Standards and Specifications

FHIR Bulk Data Access IG

The primary specification defining the export operation, status polling workflow, NDJSON output format, and authentication requirements. The current version is STU2 (Standard for Trial Use 2), with ongoing development toward normative status.

SMART Backend Services

The authentication specification for Bulk FHIR. The client registers a public key with the FHIR server. When requesting an export, the client creates a JWT signed with its private key, exchanges it for an access token, and uses that token to authenticate export requests. This eliminates the need for user-facing login during automated bulk operations.

NDJSON Output Format

Bulk FHIR exports data in NDJSON — one JSON-encoded FHIR resource per line. This format is efficient for large datasets because it supports streaming processing (no need to load the entire file into memory), is easily split across parallel processing workers, and integrates cleanly with data pipeline tools like Apache Spark, BigQuery, and Databricks.

USCDI Alignment

Bulk FHIR exports include all USCDI data classes available on the FHIR server — Patient demographics, Conditions, Medications, Observations (labs and vitals), Procedures, Immunizations, Allergies, Clinical Notes, and newer USCDI additions like SDoH data. The _type parameter allows clients to request specific resource types, but the default is all available resources.

ONC Certification Requirement

ONC certification criteria require certified health IT to support Bulk FHIR export as part of the EHI export capability mandated by the Cures Act. This means every certified EHR must be able to produce Bulk FHIR exports — though the completeness and performance of implementations varies across vendors.

Implementation Considerations

Bulk FHIR implementation spans FHIR server configuration, authentication infrastructure, data pipeline architecture, and performance optimization.

Server-side performance planning. Bulk FHIR exports can be resource-intensive — a full population export for a large health system may produce hundreds of gigabytes of NDJSON data and require significant server CPU, memory, and I/O. Plan for dedicated export processing capacity separate from your transactional FHIR API to avoid degrading real-time clinical data access.

Incremental export is essential. Full population exports are expensive. The _since parameter enables incremental exports — extracting only resources that have changed since the last export. Build your data pipeline to perform full exports infrequently (weekly or monthly) and incremental exports daily or more frequently. This dramatically reduces server load and processing time.

NDJSON pipeline architecture. The NDJSON output from Bulk FHIR needs to be ingested, parsed, transformed, and loaded into your analytics or reporting infrastructure. Common pipeline patterns include loading into a cloud data lake (S3, Azure Blob, GCS), transforming with Apache Spark or dbt, and serving through a SQL-queryable layer (BigQuery, Snowflake, Redshift). Design your pipeline for idempotent processing — re-running an export should produce the same result without duplicating data.

Data quality and completeness. Not all FHIR servers produce the same quality of Bulk FHIR output. Some servers may omit optional but clinically important fields. Some may export coded data using proprietary vocabularies instead of standard terminologies (SNOMED CT, LOINC, ICD-10). Validate export output against USCDI expectations and US Core profiles before depending on it for analytics.

Security and PHI protection. Bulk FHIR exports contain large volumes of protected health information. All export files must be encrypted at rest and in transit. Access to export endpoints must be restricted to authorized systems with valid SMART Backend Services credentials. Audit logging must capture who requested exports, what data was exported, and when files were downloaded.

De-identification for secondary use. If Bulk FHIR data is used for research, analytics, or AI model training, de-identification may be required. Build de-identification pipelines into your data processing workflow — applying Safe Harbor or Expert Determination methods before data enters non-production environments.

Population health and analytics use cases. Bulk FHIR is the data extraction layer for population health platforms — risk stratification, care gap identification, quality measure calculation, and value-based care performance monitoring all depend on population-level clinical data. The combination of Bulk FHIR for extraction and FHIR analytics (SQL-on-FHIR) for querying is becoming the standard architecture for healthcare analytics.

How Taction Helps with Bulk FHIR

At Taction, our team implements Bulk FHIR export capabilities, builds data pipelines that consume Bulk FHIR output, and helps organizations leverage population-level FHIR data for analytics and reporting.

What we do:

Bulk FHIR server implementation — We implement and optimize Bulk FHIR export capabilities on FHIR servers — configuring export operations, SMART Backend Services authentication, NDJSON output generation, and ONC certification-compliant EHI export.

Data pipeline development — We build end-to-end data pipelines that ingest Bulk FHIR NDJSON output, transform FHIR resources into analytics-ready schemas, and load them into cloud data platforms (BigQuery, Snowflake, Databricks, Redshift).

Population health analytics — We build population health platforms powered by Bulk FHIR data — risk stratification models, care gap dashboards, quality measure calculators, and value-based care performance trackers.

Incremental sync architecture — We design incremental extraction patterns using _since parameters and change tracking — minimizing server load while keeping analytics data current.

FHIR data quality validation — We build validation pipelines that check Bulk FHIR output against USCDI requirements, US Core profiles, and terminology standards — identifying data quality issues before they corrupt analytics.

Related Terms and Resources

Explore related glossary terms:

What is FHIR? — The API standard that Bulk FHIR extends for population-level access

What is SMART on FHIR? — The framework whose Backend Services specification Bulk FHIR uses

What is Population Health? — Analytics use cases powered by Bulk FHIR data extraction

What is EHR Migration? — System transitions where Bulk FHIR enables standards-based data extraction