Standardizing Medical Devices Surveillance Data

Why?

Medical device event data are messy.

Common challenges include:

Performing ongoing surveillance on messy data
Quickly answering simple questions such as:
- Are events trending up?
- How did my trends look 1 year ago? 2 years ago?
Incompatibility of various sources of device-events
Difficulty integrating exposures, a.k.a. denominator data
Understanding all the possible combinations to analyze
Application of disproportionality analysis (DPA)
Documentation of analyses in a auditable, reproducible way

How?

The mds package provides a standardized framework to address these challenges:

Standardize events involving medical devices
Standardize exposures of the device (also known as opportunities for an event to occur, or event denominator)
Enumerate possible analyses in a flexible way
Generate times series of analyses for trending over time
Set up analyses for easy application of disproportionality analysis (DPA)
Save all files in lightweight R files for auditability, documentation, and reproducibility

Purpose of This Vignette

Introduce the basics on using mds
4 steps to go from raw device-event and exposure data to plotting event counts and rates over time
How to use the core mds functions: deviceevent(), exposure(), define_analyses(), time_series()
Explain advanced usage options

Note on Statistical Algorithms

mds data and analysis standards allow for seamless application of various statistical trending algorithms via the mdsstat package (under development).

Data: MAUDE and Simulated Sales

Our example dataset maude was queried from the FDA MAUDE API and contains 535 reported events on bone cement in 2017. Furthermore, a simulated exposure dataset sales was generated to provide denominator data for our bone cement events.

library(mds)
dim(maude)
#> [1] 535  15
dim(sales)
#> [1] 360   4

head(maude, 3)

report_number	event_type	date_received	product_problem_flag	adverse_event_flag	report_source_code	lot_number	manufacturer_d_name	manufacturer_d_country	brand_name	device_name	medical_specialty_description	device_class	region
0002249697-2017-00023	Malfunction	20170103	Y	N	Manufacturer report	MHX076	STRYKER ORTHOPAEDICS-MAHWAH	US	SIMPLEX P - US TOBRA FD 10-PK	Bone Cement	Orthopedic	2	Central
0002249697-2017-00028	Malfunction	20170103	Y	N	Manufacturer report	MHX080	STRYKER ORTHOPAEDICS-MAHWAH	US	SIMPLEX P - US TOBRA FD 10-PK	Bone Cement	Orthopedic	2	West
0002249697-2017-00025	Malfunction	20170103	Y	N	Manufacturer report	MHX076	STRYKER ORTHOPAEDICS-MAHWAH	US	SIMPLEX P - US TOBRA FD 10-PK	Bone Cement	Orthopedic	2	Central

head(sales, 3)

device_name	region	sales_month	sales_volume
Arthroscope	Central	2017-01-01	83
Arthroscope	Central	2017-02-01	119
Arthroscope	Central	2017-03-01	112

Raw Data to Trending in 4 Steps

The general workflow to go from data to trending over time is as follows:

Use deviceevent() to standardize device-event data.
Use exposure() to standardize exposure data (optional).
Use define_analyses() to enumerate possible analysis combinations.
Use time_series() to generate counts (and/or rates) by time based on your defined analyses.

Live Example

# Step 1 - Device Events
de <- deviceevent(
  maude,
  time="date_received",
  device_hierarchy=c("device_name", "device_class"),
  event_hierarchy=c("event_type", "medical_specialty_description"),
  key="report_number",
  covariates="region",
  descriptors="_all_")

# Step 2 - Exposures (Optional step)
ex <- exposure(
  sales,
  time="sales_month",
  device_hierarchy="device_name",
  match_levels="region",
  count="sales_volume")

# Step 3 - Define Analyses
da <- define_analyses(
  de,
  device_level="device_name",
  exposure=ex,
  covariates="region")

# Step 4 - Time Series
ts <- time_series(
  da,
  deviceevents=de,
  exposure=ex)

What to Do Next

You may:

Save your data (de, ex), analyses (da), and time series (ts) for documentation
Summarize your analyses using summary() and define_analyses_dataframe()
plot() your time series (plotting options)
Run trending statistics (and/or use the mdsstat package)

Summarize Defined Analyses

summary(da)
#> $`Analyses Timestamp`
#> [1] "2025-02-04 04:24:17 UTC"
#> 
#> $`Analyses Counts`
#>         Total Analyses Analyses with Exposure          Device Levels 
#>                     27                     27                      6 
#>           Event Levels             Covariates 
#>                      1                      2 
#> 
#> $`Date Ranges`
#>           Data      Start        End
#> 1 Device-Event 2017-01-01 2017-12-01
#> 2     Exposure 2017-01-01 2017-12-01
#> 3         Both 2017-01-01 2017-12-01

Show All Analyses as a Data Frame

dadf <- define_analyses_dataframe(da)

head(dadf, 3)

id	device_level_source	device_level	device_1up_source	device_1up	event_level_source	event_level	covariate	covariate_level	invivo	date_range_de_start	date_range_de_end	exp_device_level	exp_covariate_level	date_range_exposure_start	date_range_exposure_end	date_range_de_exp_start	date_range_de_exp_end
1	device_name	Bone Cement	device_class	2	event_type	All	region	Central	FALSE	2017-01-01	2017-12-01	Bone Cement	Central	2017-01-01	2017-12-01	2017-01-01	2017-12-01
2	device_name	Bone Cement	device_class	2	event_type	All	region	West	FALSE	2017-01-01	2017-12-01	Bone Cement	West	2017-01-01	2017-12-01	2017-01-01	2017-12-01
3	device_name	Bone Cement	device_class	2	event_type	All	region	East	FALSE	2017-01-01	2017-12-01	Bone Cement	East	2017-01-01	2017-12-01	2017-01-01	2017-12-01

Plot Time Series of Counts and Rates

plot(ts[[1]])
plot(ts[[4]], "rate", type='l')

`deviceevent()` to Standardize Device-Event Data

Basic Usage

de <- deviceevent(maude, "date_received", c("device_name", "device_class"), c("event_type", "medical_specialty_description"))

head(de, 3)

key	time	device_1	device_2	event_1	event_2
1	2017-01-03	Bone Cement	2	Malfunction	Orthopedic
2	2017-01-03	Bone Cement	2	Malfunction	Orthopedic
3	2017-01-03	Bone Cement	2	Malfunction	Orthopedic

Advanced Usage

de <- deviceevent(
  maude,
  time="date_received",
  device_hierarchy=c("device_name", "device_class"),
  event_hierarchy=c("event_type", "medical_specialty_description"),
  key="report_number",
  covariates="region",
  descriptors="_all_")

head(de, 3)

key	time	device_1	device_2	event_1	event_2	region	product_problem_flag	adverse_event_flag	report_source_code	lot_number	manufacturer_d_name	manufacturer_d_country	brand_name
0002249697-2017-00023	2017-01-03	Bone Cement	2	Malfunction	Orthopedic	Central	Y	N	Manufacturer report	MHX076	STRYKER ORTHOPAEDICS-MAHWAH	US	SIMPLEX P - US TOBRA FD 10-PK
0002249697-2017-00028	2017-01-03	Bone Cement	2	Malfunction	Orthopedic	West	Y	N	Manufacturer report	MHX080	STRYKER ORTHOPAEDICS-MAHWAH	US	SIMPLEX P - US TOBRA FD 10-PK
0002249697-2017-00025	2017-01-03	Bone Cement	2	Malfunction	Orthopedic	Central	Y	N	Manufacturer report	MHX076	STRYKER ORTHOPAEDICS-MAHWAH	US	SIMPLEX P - US TOBRA FD 10-PK

Required Arguments

data_frame: is the input device-event data frame. All remaining arguments refer to variables within this data frame.
time: is the name of the variable containing the time of the event. The input format of the time will be flexibly converted into Date format.
device_hierarchy: organizes all your device variables into a hierarchy. The hierarchical concept reflects how devices are often nested into progressively more general groups. Set the first variable as the lowest device level that you would like to trend at. mds remembers this hierarchy and allows trending at multiple levels as you specify.
event_hierarchy: organizes all your event variables into a hierarchy. Like devices, event variables should be categorical in nature. Free text descriptions should not be listed here, but rather in the descriptors argument. The hierarchical concept reflects how events are often nested into progressively more general groups. Set the first variable as the lowest event level that you would like to trend at. mds remembers this hierarchy and allows trending at multiple levels as you specify. If your data does not have an event variable, you will need to create a dummy variable.

Optional Arguments

key: is a unique identifier for each unique event in data_frame. If your data pipeline carries over a key variable, it is recommended to specify it here. The key allows downstream aggregated analysis to be able to “look up” individual constituent events.
covariates: are a special group of variables that may be analyzed within device. For instance, declaring covariates="Region" will allow analysis of regions within device. These variables should be categorical in nature.
descriptors: are additional variables that should be retained for the purpose of describing individual events in downstream analysis.
implant_days: contains the age in days of an implantable device at the time of the event.

`exposure()` to Standardize Exposure Data

Exposure data is meant to support device-event data. As such, the general expectation is that variable values match between exposure and device-event data. For example, 10 exposures for ev3 Solitaire in France will be matched exactly to ev3 Solitaire events in France, and not to events for EV3 SOLITAIRE in FRANCE.

Basic Usage

ex <- exposure(sales, "sales_month", "device_name")

head(ex, 3)

key	time	count	device_1
1	2017-01-01	1	Arthroscope
2	2017-02-01	1	Arthroscope
3	2017-03-01	1	Arthroscope

Advanced Usage

ex <- exposure(
  sales,
  time="sales_month",
  device_hierarchy="device_name",
  match_levels="region",
  count="sales_volume")

head(ex, 3)

key	time	count	device_1	region
1	2017-01-01	83	Arthroscope	Central
2	2017-02-01	119	Arthroscope	Central
3	2017-03-01	112	Arthroscope	Central

Required Arguments

Note: Although not required, count will commonly be used as well.

data_frame: is the input exposure data frame. All remaining arguments refer to variables within this data frame.
time: is the name of the variable containing the time of the exposure. The input format of the time will be flexibly converted into Date format. If exposure will be used, it is critical to have sufficient time granularity. For example, if analysis will be done monthly, exposure data must be no less granular than monthly. mds does not make assumptions about filling in holes in time!
device_hierarchy: contains all exposure device variables to match to your device-event data. As such, the values within these variables must match exactly to the values within each respective variable in the device-event data device_hierarchy parameter.
event_hierarchy: contains all exposure event variables to match to your device-event data. As such, the values within these variables must match exactly to the values within each respective variable in the device-event data event_hierarchy parameter. Exposures at an event level is not common.

Optional Arguments

count: is the most commonly specified optional parameter. It contains the number of exposures. If not specified, the number of rows will be used as a proxy for count.
key: is a unique identifier for each unique exposure in data_frame. If your data pipeline carries over a key variable, it is recommended to specify it here. The key allows downstream aggregated analysis to be able to “look up” individual constituent exposure records.
match_levels: are variables aside from time, device, and event that specify an exposure. A common “match level” is country, if your exposure data is specific by country.

`define_analyses()` to Enumerate Analysis Combinations

After standardizing device-event data using deviceevent() and, optionally, exposure data using exposure(), the next step is to discover what types of analyses are possible. This is separated from actually doing the analysis (counting, calculations, statistics, etc.) because:

You may not need to analyze everything
You may want to customize specific analyses, such as:
- Custom date ranges
- Covariate analysis for certain device-events
- Analyze by month for one device, by quarter for another device
- Use exposures for certain devices but not others
A saved list of defined analyses quickly and easily answers:
- Which devices and events did I analyze? When?
- Did I also look at covariates?
- Over what time period did I analyze?
- Did I analyze monthly, quarterly, yearly, etc?
- Did I use exposures?

Basic Usage

da <- define_analyses(de, "device_name")

Note that define_analyses() returns a list of individual analyses. Each individual analysis contains a set of instructions. You can view an analysis by submitting da[[1]], da[[2]], etc., but a less cumbersome overview is possible using summary() and define_analyses_dataframe().

summary(da)
#> $`Analyses Timestamp`
#> [1] "2025-02-04 04:24:19 UTC"
#> 
#> $`Analyses Counts`
#>         Total Analyses Analyses with Exposure          Device Levels 
#>                      7                      0                      6 
#>           Event Levels             Covariates 
#>                      1                      1 
#> 
#> $`Date Ranges`
#>           Data      Start        End
#> 1 Device-Event 2017-01-01 2017-12-01
#> 2     Exposure       <NA>       <NA>
#> 3         Both 2017-01-01 2017-12-01

head(define_analyses_dataframe(da), 3)

id	device_level_source	device_level	device_1up_source	device_1up	event_level_source	event_level	covariate	covariate_level	invivo	date_range_de_start	date_range_de_end	date_range_de_exp_start	date_range_de_exp_end
1	device_name	Bone Cement	device_class	2	event_type	All	Data	All	FALSE	2017-01-01	2017-12-01	2017-01-01	2017-12-01
2	device_name	Bone Cement, Antibiotic	device_class	2	event_type	All	Data	All	FALSE	2017-01-01	2017-12-01	2017-01-01	2017-12-01
3	device_name	Cement, Bone, Vertebroplasty	device_class	2	event_type	All	Data	All	FALSE	2017-01-01	2017-12-01	2017-01-01	2017-12-01

Advanced Usage

da <- define_analyses(
  de,
  device_level="device_name",
  exposure=ex,
  covariates="region")

summary(da)
#> $`Analyses Timestamp`
#> [1] "2025-02-04 04:24:19 UTC"
#> 
#> $`Analyses Counts`
#>         Total Analyses Analyses with Exposure          Device Levels 
#>                     27                     27                      6 
#>           Event Levels             Covariates 
#>                      1                      2 
#> 
#> $`Date Ranges`
#>           Data      Start        End
#> 1 Device-Event 2017-01-01 2017-12-01
#> 2     Exposure 2017-01-01 2017-12-01
#> 3         Both 2017-01-01 2017-12-01

head(define_analyses_dataframe(da), 3)

id	device_level_source	device_level	device_1up_source	device_1up	event_level_source	event_level	covariate	covariate_level	invivo	date_range_de_start	date_range_de_end	exp_device_level	exp_covariate_level	date_range_exposure_start	date_range_exposure_end	date_range_de_exp_start	date_range_de_exp_end
1	device_name	Bone Cement	device_class	2	event_type	All	region	Central	FALSE	2017-01-01	2017-12-01	Bone Cement	Central	2017-01-01	2017-12-01	2017-01-01	2017-12-01
2	device_name	Bone Cement	device_class	2	event_type	All	region	West	FALSE	2017-01-01	2017-12-01	Bone Cement	West	2017-01-01	2017-12-01	2017-01-01	2017-12-01
3	device_name	Bone Cement	device_class	2	event_type	All	region	East	FALSE	2017-01-01	2017-12-01	Bone Cement	East	2017-01-01	2017-12-01	2017-01-01	2017-12-01

Required Arguments

deviceevents: is a standardized device-event data frame. (class() should contain "mds_de")
device_level: is the source device variable to analyze by. Access these variable names by submitting attributes(de)$device_hierarchy.

Optional Arguments

event_level: is the source event variable to analyze by. Access these variable names by submitting attributes(de)$event_hierarchy.
exposure: is a standardized exposure data frame. (class() should contain "mde_e")
date_level and date_level_n: are arguments used together to specify the time interval to analyze by. Default of "months" and 1 analyzes by month. Other examples include "months" and 12 for yearly, or "days" and 7 for weekly.
covariates: are variables to analyze within device. For example, c("region") analyzes by each level of region within device.
times_to_calc: specifies how many time periods (counting back in time) to analyze for. Time period is defined using date_level and date_level_n.

What About Analyses Across All Events, All Devices, etc?

It is always assumed that analyses at aggregated levels are desired. (such as analysis of all events for a given device, or analysis of all events across all devices)

Aggregated level analysis is easily recognized by the "All" and "Data" values in device_level, event_level, covariate, and covariate_level.

	id	device_level_source	device_level	device_1up_source	device_1up	event_level_source	event_level	covariate	covariate_level	invivo	date_range_de_start	date_range_de_end	exp_device_level	exp_covariate_level	date_range_exposure_start	date_range_exposure_end	date_range_de_exp_start	date_range_de_exp_end
11	11	device_name	Cement, Bone, Vertebroplasty	device_class	2	event_type	All	region	East	FALSE	2017-01-01	2017-08-01	Cement, Bone, Vertebroplasty	East	2017-01-01	2017-12-01	2017-01-01	2017-08-01
12	12	device_name	Cement, Bone, Vertebroplasty	device_class	2	event_type	All	region	Central	FALSE	2017-02-01	2017-12-01	Cement, Bone, Vertebroplasty	Central	2017-01-01	2017-12-01	2017-02-01	2017-12-01
NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA

How to Customize Analyses

There are several options:

Select certain analyses from the list (e.g. da[[c(1:5, 24:27)]]))
Rerun define_analyses() with different parameter settings
Edit values within an individual analysis (e.g. da[[1]]$date_range_exposure['start'] <- as.Date("2016-10-01"))

`time_series()` to Generate Counts, Rates, and More

Once an analysis has been defined using define_analyses(), the analyses instructions can be executed using time_series(), returning by defined time periods:

Counts at the device, event, and (optionally) covariate level of interest.
Event IDs (the key parameter from deviceevent()) for lookup of individual event records.
Exposure counts
Exposure IDs (the key parameter from exposure()) for lookup of individual exposure records.
Disproportionality analysis (DPA) related counts:
- Counts of the device & non-event
- Counts of the non-device & event
- Counts of the non-device & non-event

Basic Usage

ts <- time_series(da, de)

Note that time_series() returns, in a list, one time series data frame for every analysis. You can select a time series by submitting ts[[1]], ts[[2]], etc.

head(ts[[1]], 3)

time	nA	ids
2017-01-01	13	0002249697-2017-00023
2017-02-01	7	0002249697-2017-00488
2017-03-01	5	0002249697-2017-00755

Advanced Usage

ts <- time_series(
  da,
  deviceevents=de,
  exposure=ex)

head(ts[[1]], 3)

time	nA	ids	exposure	ids_exposure
2017-01-01	13	0002249697-2017-00023	8597	37
2017-02-01	7	0002249697-2017-00488	5115	38
2017-03-01	5	0002249697-2017-00755	10191	39

Required Arguments

analysis: is a single defined analysis (class() should contain "mds_da") or a list of defined analysis.
deviceevents: is a standardized device-event data frame (class() contains "mds_de"). It is typically the same data frame used to generate analysis, but can be another "mds_de" data frame, such as a cut of the data at a different time. Note if, say, an older dataset is being used, the analysis date ranges must correspond.

Optional Arguments

exposure: is a standardized exposure data frame (class() contains "mds_e"). It is typically the same data frame used to generate analysis. Like deviceevents, another data frame may be used, but the analysis instructions must correspond.
use_hierarchy: is a logical value for whether device and event hierarchies should be used for the calculation of disproportionality analysis (DPA) counts. Submit ?time_series.mds_da for more details.

How to Modify Counts & Exposures

It is not uncommon to adjust event and exposure counts, such as with applications of rolling or moving averages. These adjustments should be applied after generating time series data frames from time_series().

`plot()`ing a Time Series

Plotting an individual time series generated by time_series() is simple. Simply call plot() on the time series object:

plot(ts[[1]])

There are a few custom parameters, including:

mode: with common values of "nA" (representing the device-event of interest), "exposure", and "rate" (simply "nA"/"exposure"). Less common are "nB", "nC", and "nD" representing the cell counts of the disproportionality analysis (DPA) contingency table.
xlab, ylab, main: representing default plot() behavior. By default, axes and title labels are inferred directly from the time series.

All other parameters are from plot.default().

Standardizing Medical Devices Surveillance Data

Data: MAUDE and Simulated Sales

Raw Data to Trending in 4 Steps

Live Example

What to Do Next

Summarize Defined Analyses

Show All Analyses as a Data Frame

Plot Time Series of Counts and Rates

deviceevent() to Standardize Device-Event Data

Required Arguments

Optional Arguments

exposure() to Standardize Exposure Data

Required Arguments

Optional Arguments

define_analyses() to Enumerate Analysis Combinations

Required Arguments

Optional Arguments

What About Analyses Across All Events, All Devices, etc?

How to Customize Analyses

time_series() to Generate Counts, Rates, and More

Required Arguments

Optional Arguments

How to Modify Counts & Exposures

plot()ing a Time Series

`deviceevent()` to Standardize Device-Event Data

`exposure()` to Standardize Exposure Data

`define_analyses()` to Enumerate Analysis Combinations

`time_series()` to Generate Counts, Rates, and More

`plot()`ing a Time Series