cutoffvalue: An R Package for Cutoff Value Determination & Graphing
Lea R Medeiros
2025-08-21
Help.Rmd
Overview of the cutoffvalue package
Description
cutoffvalue is a simple R package that implements an updated version of the method first developed and used in Medeiros et al. (2018)1. It can be used to determine an objective cutoff value between a significantly bimodal distribution of log-transformed data and plot a representative graph of the results.
The functions in this package are written to utilize the example data set (an internal dataset object identified as “cutoffvalue:::exampledata”) and will use it by default if the path for a dataset is not provided. The examples below specify this, along with any other default parameters for each function. Below are the instructions on how to install the cutoffvalue package, setup the R Studio environment, and suggestions on how to import and convert your dataset into the proper format (i.e., a list of numeric values).
Vignette Info & Suggested Workflow
This vignette will go over the various functions included in cutoffvalue using the included example data. The overall goals of this package are to (1) determine a cutoff value between the upper and lower modes of the dataset and (2) produce a nice graph of the results that includes a histogram of the data, the two models fit to the upper and lower modes, and a line depicting the cutoff value. The functions are written to be run independently, so that only two functions need to be run to get the necessary information:
- modes() <- Determines modality of the dataset, which is an assumption for the remaining functions
- cutoffplot() <- Determines the cutoff value and produces a nice graph of the results
This vignette is written in accordance with the suggested workflow using these two functions. Subsequent sections discuss the functions that are included within them.
Installation of the package
You can install the latest version of cutoffvalue from GitHub with:
# devtools::install_github("lea-medeiros/cutoffvalue", dependencies = TRUE, build_vignettes = TRUE)
Setup R Studio
Load the cutoffvalue package
library(cutoffvalue)
library(readxl) # Only necessary if you will be importing an excel file (see below)
Import your raw dataset
Import the data file to be used in the analyses and graph. The package includes a dataset for use as an example - this object is accessible as “cutoffvalue:::exampledata” and will be used in the examples.
Import your own dataset any way you prefer. I find that the easiest way to import data is to use the “Import Dataset” function built into R Studio, but you can also use code. The imported data must then be converted into a list of numeric values. An example of code that should work (after you update certain parts to be specific to your dataset) is included below.
# yourrawdata <- read_excel("/path/to/your/excel/data", col_names = TRUE) # Imports the data as a dataframe with first row as column names
# yourrawdata <- as.numeric(yourrawdata$columnname) # converts the specified column to a numeric list of values
Each function in this package uses the provided dataset (whether it’s
the example dataset or one you provided) and cleans it up (via the
cleandata
function, see below) to remove any blank cells.
This function then provides a list of objects: the data
(mydata$data
), the maximum value
(mydata$upper
), and the minimum value
(mydata$lower
), all of which are then used in subsequent
calculations.
Using two functions to determine modality and generate the final plot
Use the modes
function to determine modality
Evaluating an objective and valid cutoff value depends upon whether the dataset is bimodal. Which means that the modality of the dataset should be determined before proceeding. The null hypothesis of this test is that the dataset is unimodal; an excess mass statistic associated with a p-value of less than 0.05 implies it is more than unimodal. It is strongly suggested that you run this test prior to any other function in cutoffvalue; however, how you proceed depends on your knowledge and understanding of the data.
The
modes
function uses the example dataset (an internal dataset object identified as “cutoffvalue:::exampledata”) by default. Thus, if the path for the dataset is not specified (e.g., running modes() in the console), this is the data that will be used.
When given a label (i.e., “modetest” in the example below), this function returns the Excess Mass statistic and associated p-value to the Environment.
Please keep in mind that the default parameters of cutoffvalue assume a bimodal distribution - any other modalities may cause inaccuracies in the results.
modetest <- modes(cutoffvalue:::exampledata)
#> Modality Test Results:
#> - Excess mass = 0.097446
#> - p-value = 0.002
#> **Reject null hypothesis** Distribution contains more than one mode; proceed with analyses.
#>
#> Test Credit: Ameijeiras-Alonso et al. (2019) excess mass test
The modes()
function returns the p-value and excess
mass statistic along with instructions on how to proceed based on the
p-value. If the p-value is less than 0.05, accept the alternative
hypothesis (data is at least bimodal) and proceed with analysis.
However, if the p-value is more than 0.05, the data is unimodal and the
following analyses are not entirely valid.
Use the cutoffplot
function to plot the final
graph
Plot a pretty graph that includes a histogram of the data, curve lines for each mode generated from the model results, the cutoff value depicted as a line, and labels customized for your dataset.
The
cutoffplot
function uses the example dataset (an internal dataset object identified as “cutoffvalue:::exampledata”) by default. Thus, if the path for the dataset is not specified (e.g., running cutoffplot() in the console), this is the data that will be used. Parameters (and their defaults) for the resulting graph are listed below.
When given a label (i.e., “plotty” in the example below), this function returns the log-transformed cutoff value to the Environment.
Specify graph labels
You will likely want to change the following parameters to match your
own dataset and preferences. If nothing is specified in the
cutoffplot
function, then these are the defaults that will
be used in the graph.
title <- "Plasma 11-KT levels in age-2 male spring chinook" # Graph title
xlab <- "Plasma [11-KT] (ng/mL)" # X-axis label
cutofflab <- "Minijack cutoff" # label for cutoff value on graph
cutoffunits <- "(ng/mL)" # units for cutoff value
LowerMode_col <- "red" # line color for the lower mode
LowerMode_lty <- 1 # line type for the lower mode
LowerMode_lwd <- 2 # line width for the lower mode
UpperMode_col <- "purple" # line color for the upper mode
UpperMode_lty <- 1 # line type for the upper mode
UpperMode_lwd <- 2 # line width for the upper mode
cutoffvalue_col <- "black" # line color for the cutoff value
cutoffvalue_lty <- 2 # line type for the cutoff value
cutoffvalue_lwd <- 2 # line width for the cutoff value
Plot the graph
plotty <- cutoffplot(cutoffvalue:::exampledata, title, xlab, cutofflab, cutoffunits, LowerMode_col, LowerMode_lty, LowerMode_lwd, UpperMode_col, UpperMode_lty, UpperMode_lwd, cutoffvalue_col, cutoffvalue_lty, cutoffvalue_lwd)
Other Functions Included in the cutoffvalue Package
The following functions are included in the cutoffvalue package and
can be run independently. They are discussed in order of operations for
having the cutoffplot
function work properly.
Use the cleandata
function to clean up your raw
dataset
This function is used to clean up your dataset by removing any blank
rows and transforms the data into a list of numbers. This list of values
(mydata$data
) is returned to the Environment along with the
minimum (mydata$lower
) and maximum values
(mydata$upper
), which are used in subsequent functions.
The mydata label must be applied to this function for it to be
used in subsequent functions.
If nothing is specified (i.e., only “cleandata()” is typed into the console), this function will use an internal dataset object identified as “cutoffvalue:::exampledata” as the default dataset.
mydata <- cleandata(cutoffvalue:::exampledata)
Use the datamodel
function to generate models for each
mode of the dataset
The datamodel
function fits two component mixture models
to the data and plots a rough histogram with the fitted lines. It also
defines the index.lower value to be used in the find.cutoff
function.
The
datamodel
function uses the example dataset (an internal dataset object identified as “cutoffvalue:::exampledata”) by default. Thus, if the path for the dataset is not specified (e.g., running importdata() in the console), this is the dataset that will be used.
When given a label (i.e., “model” in the example below), the
datamodel()
function returns a list of 2 objects to the Environment -model$mydata
andmodel$indexLower
, which are used in subsequent functions.
model <- datamodel(cutoffvalue:::exampledata)
This isn’t the final graph, but still should be inspected to ensure that things look right. In particular, make sure that the point where the two curves intersect is where you are expecting the cutoff to be.
Use the findcutoff
function to determine the cutoff
value
The findcutoff
function determines the cutoff value
between the two modes with an equal chance of being drawn from either
mode. The default probability is set to 50% (i.e., “proba=0.5” in the
example below), but the probability can be changed in the code.
The defaults for this function an internal dataset object identified as “cutoffvalue:::exampledata” as the raw dataset and 0.5 as the probability (i.e., running “findcutoff()” in the console will use these values)
Running the
findcutoff
function with a label (e.g., “cutoff” in the example below) will return the cutoff value to the Environment, but otherwise it does not report to the console. Use “returnValue(cutoff)” if you would like to see the value in the console.
cutoff <- findcutoff(cutoffvalue:::exampledata, proba=0.5)
#> number of iterations= 13
#> Cutoff Value: 0.1137364
The uniroot lower (mydata$lower
) and upper values
(mydata$upper
) are determined using the range of “mydata”
and will reflect the dataset being analyzed. If there are errors due to
the uniroot, consider editing the custom values to something that more
generally reflects the range of the data being analyzed.
Use the fitparams
function to produce a basic histogram
and associated parameters
The fitparams
function will produce a basic histogram
from the dataset, which is then used to generate certain parameters for
the curve fitting functions in subsequent functions. As such, you should
alter the number of breaks to produce a graph representative of what you
would like to see in the final plot - if not specified, the default is
15.
If nothing is specified (i.e., only “fitparams()” is typed into the console), this function will use an internal dataset object identified as “cutoffvalue:::exampledata” as the default dataset.
The fitparams
function returns a list of values to the
Environment that are used in subsequent functions.
fit <- fitparams(cutoffvalue:::exampledata, breaks = 15)
This histogram should be a general outline of what you would like the histogram in the final plot to look like. If it is not, change the number of breaks until it is. The scale of the axes in the final graph will better represent your dataset, so don’t worry about those.
Use the curves
function to generate points for the
curves
The curves
function determines x and y values to
calculate the points for the curves that represent the generated models
in the final plot.
If nothing is specified (i.e., only “curves()” is typed into the console), this function will use an internal dataset object identified as “cutoffvalue:::exampledata” as the default dataset.
The curves
function returns a list of 3 objects to the
Environment, which are used by the cutoffplot
function when
producing the final plot.
curves <- curves(cutoffvalue:::exampledata)