--- title: "Standardizing Medical Devices Surveillance Data" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Medical Devices Surveillance Data Standardization} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) load(file="../data/maude.rda") load(file="../data/sales.rda") library(mds) ``` **Why?** Medical device event data are messy. Common challenges include: - Performing ongoing surveillance on messy data - Quickly answering simple questions such as: - Are events trending up? - How did my trends look 1 year ago? 2 years ago? - Incompatibility of various sources of device-events - Difficulty integrating exposures, a.k.a. denominator data - Understanding all the possible combinations to analyze - Application of disproportionality analysis (DPA) - Documentation of analyses in a auditable, reproducible way **How?** The `mds` package provides a standardized framework to address these challenges: - Standardize events involving medical devices - Standardize exposures of the device (also known as opportunities for an event to occur, or event denominator) - Enumerate possible analyses in a flexible way - Generate times series of analyses for trending over time - Set up analyses for easy application of disproportionality analysis (DPA) - Save all files in lightweight `R` files for auditability, documentation, and reproducibility **Purpose of This Vignette** - Introduce the basics on using `mds` - [4 steps](#foursteps) to go from raw device-event and exposure data to [plotting](#quickplot) event counts and rates over time - How to use the core `mds` functions: [deviceevent()](#de), [exposure()](#ex), [define_analyses()](#da), [time_series()](#ts) - Explain advanced usage options **Note on Statistical Algorithms** `mds` data and analysis standards allow for seamless application of various statistical trending algorithms via the `mdsstat` package (under development). ## Data: MAUDE and Simulated Sales Our example dataset `maude` was queried from the [FDA MAUDE API](https://open.fda.gov) and contains 535 reported events on bone cement in 2017. Furthermore, a simulated exposure dataset `sales` was generated to provide denominator data for our bone cement events. ```{r} library(mds) dim(maude) dim(sales) ``` head(maude, 3) ```{r, echo=FALSE, results='asis'} knitr::kable(head(maude, 3)) ``` head(sales, 3) ```{r, echo=FALSE, results='asis'} knitr::kable(head(sales, 3)) ``` ## Raw Data to Trending in 4 Steps {#foursteps} The general workflow to go from data to trending over time is as follows: 1. Use `deviceevent()` to standardize device-event data. 1. Use `exposure()` to standardize exposure data (optional). 1. Use `define_analyses()` to enumerate possible analysis combinations. 1. Use `time_series()` to generate counts (and/or rates) by time based on your defined analyses. ### Live Example ```{r} # Step 1 - Device Events de <- deviceevent( maude, time="date_received", device_hierarchy=c("device_name", "device_class"), event_hierarchy=c("event_type", "medical_specialty_description"), key="report_number", covariates="region", descriptors="_all_") # Step 2 - Exposures (Optional step) ex <- exposure( sales, time="sales_month", device_hierarchy="device_name", match_levels="region", count="sales_volume") # Step 3 - Define Analyses da <- define_analyses( de, device_level="device_name", exposure=ex, covariates="region") # Step 4 - Time Series ts <- time_series( da, deviceevents=de, exposure=ex) ``` ## What to Do Next You may: - Save your data (`de`, `ex`), analyses (`da`), and time series (`ts`) for documentation - Summarize your analyses using `summary()` and `define_analyses_dataframe()` - `plot()` your time series ([plotting options](#po)) - Run trending statistics (and/or use the `mdsstat` package) ### Summarize Defined Analyses ```{r} summary(da) ``` ### Show All Analyses as a Data Frame ```{r} dadf <- define_analyses_dataframe(da) ``` head(dadf, 3) ```{r, echo=FALSE, results='asis'} knitr::kable(head(dadf, 3)) ``` ### Plot Time Series of Counts and Rates {#quickplot} ```{r, fig.show='hold'} plot(ts[[1]]) plot(ts[[4]], "rate", type='l') ``` ## `deviceevent()` to Standardize Device-Event Data {#de} **Basic Usage** ```{r} de <- deviceevent(maude, "date_received", c("device_name", "device_class"), c("event_type", "medical_specialty_description")) ``` head(de, 3) ```{r, echo=FALSE, results='asis'} knitr::kable(head(de, 3)) ``` **Advanced Usage** ```{r} de <- deviceevent( maude, time="date_received", device_hierarchy=c("device_name", "device_class"), event_hierarchy=c("event_type", "medical_specialty_description"), key="report_number", covariates="region", descriptors="_all_") ``` head(de, 3) ```{r, echo=FALSE, results='asis'} knitr::kable(head(de, 3)) ``` #### Required Arguments `data_frame` : is the input device-event data frame. All remaining arguments refer to variables within this data frame. `time` : is the name of the variable containing the time of the event. The input format of the time will be flexibly converted into `Date` format. `device_hierarchy` : organizes all your device variables into a hierarchy. The hierarchical concept reflects how devices are often nested into progressively more general groups. Set the first variable as the lowest device level that you would like to trend at. `mds` remembers this hierarchy and allows trending at multiple levels as you specify. `event_hierarchy` : organizes all your event variables into a hierarchy. Like devices, event variables should be categorical in nature. *Free text descriptions should not be listed here*, but rather in the `descriptors` argument. The hierarchical concept reflects how events are often nested into progressively more general groups. Set the first variable as the lowest event level that you would like to trend at. `mds` remembers this hierarchy and allows trending at multiple levels as you specify. *If your data does not have an event variable*, you will need to create a dummy variable. #### Optional Arguments `key` : is a unique identifier for each unique event in `data_frame`. If your data pipeline carries over a key variable, it is recommended to specify it here. The `key` allows downstream aggregated analysis to be able to "look up" individual constituent events. `covariates` : are a special group of variables that may be analyzed within device. For instance, declaring `covariates="Region"` will allow analysis of regions within device. These variables should be categorical in nature. `descriptors` : are additional variables that should be retained for the purpose of describing individual events in downstream analysis. `implant_days` : contains the age in days of an implantable device at the time of the event. ## `exposure()` to Standardize Exposure Data {#ex} Exposure data is meant to support device-event data. As such, the general expectation is that variable values match between exposure and device-event data. For example, 10 exposures for `ev3 Solitaire` in `France` will be matched **exactly** to `ev3 Solitaire` events in `France`, and not to events for `EV3 SOLITAIRE` in `FRANCE`. **Basic Usage** ```{r} ex <- exposure(sales, "sales_month", "device_name") ``` head(ex, 3) ```{r, echo=FALSE, results='asis'} knitr::kable(head(ex, 3)) ``` **Advanced Usage** ```{r} ex <- exposure( sales, time="sales_month", device_hierarchy="device_name", match_levels="region", count="sales_volume") ``` head(ex, 3) ```{r, echo=FALSE, results='asis'} knitr::kable(head(ex, 3)) ``` #### Required Arguments Note: Although not required, `count` will commonly be used as well. `data_frame` : is the input exposure data frame. All remaining arguments refer to variables within this data frame. `time` : is the name of the variable containing the time of the exposure. The input format of the time will be flexibly converted into `Date` format. If exposure will be used, it is **critical** to have sufficient time granularity. For example, if analysis will be done monthly, exposure data must be no less granular than monthly. `mds` does not make assumptions about filling in holes in time! `device_hierarchy` : contains all exposure device variables to match to your device-event data. As such, the values within these variables must match exactly to the values within each respective variable in the device-event data `device_hierarchy` parameter. `event_hierarchy` : contains all exposure event variables to match to your device-event data. As such, the values within these variables must match exactly to the values within each respective variable in the device-event data `event_hierarchy` parameter. Exposures at an event level is not common. #### Optional Arguments `count` : is the most commonly specified optional parameter. It contains the number of exposures. If not specified, the number of rows will be used as a proxy for count. `key` : is a unique identifier for each unique exposure in `data_frame`. If your data pipeline carries over a key variable, it is recommended to specify it here. The `key` allows downstream aggregated analysis to be able to "look up" individual constituent exposure records. `match_levels` : are variables aside from time, device, and event that specify an exposure. A common "match level" is country, if your exposure data is specific by country. ## `define_analyses()` to Enumerate Analysis Combinations {#da} After standardizing device-event data using `deviceevent()` and, optionally, exposure data using `exposure()`, the next step is to **discover what types of analyses are possible**. This is separated from actually doing the analysis (counting, calculations, statistics, etc.) because: - You may not need to analyze everything - You may want to customize specific analyses, such as: - Custom date ranges - Covariate analysis for certain device-events - Analyze by month for one device, by quarter for another device - Use exposures for certain devices but not others - A saved list of defined analyses quickly and easily answers: - Which devices and events did I analyze? When? - Did I also look at covariates? - Over what time period did I analyze? - Did I analyze monthly, quarterly, yearly, etc? - Did I use exposures? **Basic Usage** ```{r} da <- define_analyses(de, "device_name") ``` Note that `define_analyses()` returns a list of individual analyses. Each individual analysis contains a set of instructions. You can view an analysis by submitting `da[[1]]`, `da[[2]]`, etc., but a less cumbersome overview is possible using `summary()` and `define_analyses_dataframe()`. `r options(warn=-1)` ```{r} summary(da) ``` `r options(warn=0)` head(define_analyses_dataframe(da), 3) ```{r, echo=FALSE, results='asis'} knitr::kable(head(define_analyses_dataframe(da), 3)) ``` **Advanced Usage** ```{r} da <- define_analyses( de, device_level="device_name", exposure=ex, covariates="region") ``` ```{r} summary(da) ``` head(define_analyses_dataframe(da), 3) ```{r, echo=FALSE, results='asis'} knitr::kable(head(define_analyses_dataframe(da), 3)) ``` #### Required Arguments `deviceevents` : is a standardized device-event data frame. (`class()` should contain `"mds_de"`) `device_level` : is the source device variable to analyze by. Access these variable names by submitting `attributes(de)$device_hierarchy`. #### Optional Arguments `event_level` : is the source event variable to analyze by. Access these variable names by submitting `attributes(de)$event_hierarchy`. `exposure` : is a standardized exposure data frame. (`class()` should contain `"mde_e"`) `date_level` and `date_level_n` : are arguments used together to specify the time interval to analyze by. Default of `"months"` and `1` analyzes by month. Other examples include `"months"` and `12` for yearly, or `"days"` and `7` for weekly. `covariates` : are variables to analyze within device. For example, `c("region")` analyzes by each level of `region` within device. `times_to_calc` : specifies how many time periods (counting back in time) to analyze for. Time period is defined using `date_level` and `date_level_n`. ### What About Analyses Across All Events, All Devices, etc? **It is always assumed that analyses at aggregated levels are desired.** (such as analysis of all events for a given device, or analysis of all events across all devices) Aggregated level analysis is easily recognized by the `"All"` and `"Data"` values in `device_level`, `event_level`, `covariate`, and `covariate_level`. ```{r, echo=FALSE, results='asis'} knitr::kable(define_analyses_dataframe(da)[c(11:12, 32), ]) ``` ### How to Customize Analyses There are several options: - Select certain analyses from the list (e.g. `da[[c(1:5, 24:27)]])`) - Rerun `define_analyses()` with different parameter settings - Edit values within an individual analysis (e.g. `da[[1]]$date_range_exposure['start'] <- as.Date("2016-10-01")`) ## `time_series()` to Generate Counts, Rates, and More {#ts} Once an analysis has been defined using `define_analyses()`, the analyses instructions can be executed using `time_series()`, returning by defined time periods: - Counts at the device, event, and *(optionally)* covariate level of interest. - Event IDs (the `key` parameter from `deviceevent()`) for lookup of individual event records. - Exposure counts - Exposure IDs (the `key` parameter from `exposure()`) for lookup of individual exposure records. - Disproportionality analysis (DPA) related counts: - Counts of the device & non-event - Counts of the non-device & event - Counts of the non-device & non-event **Basic Usage** ```{r} ts <- time_series(da, de) ``` Note that `time_series()` returns, in a list, one time series data frame for every analysis. You can select a time series by submitting `ts[[1]]`, `ts[[2]]`, etc. head(ts[[1]], 3) ```{r, echo=FALSE, results='asis'} knitr::kable(head(ts[[1]], 3)) ``` **Advanced Usage** ```{r} ts <- time_series( da, deviceevents=de, exposure=ex) ``` head(ts[[1]], 3) ```{r, echo=FALSE, results='asis'} knitr::kable(head(ts[[1]], 3)) ``` #### Required Arguments `analysis` : is a single defined analysis (`class()` should contain `"mds_da"`) or a list of defined analysis. `deviceevents` : is a standardized device-event data frame (`class()` contains `"mds_de"`). It is typically the same data frame used to generate `analysis`, but can be another `"mds_de"` data frame, such as a cut of the data at a different time. Note if, say, an older dataset is being used, the `analysis` date ranges must correspond. #### Optional Arguments `exposure` : is a standardized exposure data frame (`class()` contains `"mds_e"`). It is typically the same data frame used to generate `analysis`. Like `deviceevents`, another data frame may be used, but the `analysis` instructions must correspond. `use_hierarchy` : is a logical value for whether device and event hierarchies should be used for the calculation of disproportionality analysis (DPA) counts. Submit `?time_series.mds_da` for more details. ### How to Modify Counts & Exposures It is not uncommon to adjust event and exposure counts, such as with applications of rolling or moving averages. These adjustments should be applied **after** generating time series data frames from `time_series()`. ## `plot()`ing a Time Series {#po} Plotting an individual time series generated by `time_series()` is simple. Simply call `plot()` on the time series object: ```{r} plot(ts[[1]]) ``` There are a few custom parameters, including: `mode` : with common values of `"nA"` (representing the device-event of interest), `"exposure"`, and `"rate"` (simply `"nA"/"exposure"`). Less common are `"nB"`, `"nC"`, and `"nD"` representing the cell counts of the disproportionality analysis (DPA) contingency table. `xlab`, `ylab`, `main` : representing default `plot()` behavior. By default, axes and title labels are inferred directly from the time series. All other parameters are from `plot.default()`.