Analyzes protein binding frequency across splicing junction regions. Uses a 4-region structure to show where protein binding sites appear relative to exon/intron boundaries. Filters events into Retained, Excluded, and Control groups.
Usage
createSplicingMap(
bed_file,
SEMATS,
moving_average = 50,
WidthIntoExon = 50,
WidthIntoIntron = 300,
p_valueRetainedAndExclusion = 0.05,
p_valueControls = 0.95,
retained_IncLevelDifference = 0.1,
exclusion_IncLevelDifference = -0.1,
Min_Count = 50,
groups = c("Retained", "Excluded", "Control"),
control_multiplier = 2,
control_iterations = 20,
z_threshold = 1.96,
min_consecutive = 10,
one_sided = TRUE,
use_fdr = TRUE,
fdr_threshold = 0.05,
show_significance = TRUE,
return_data = FALSE,
return_diagnostics = FALSE,
verbose = TRUE,
progress_callback = NULL,
title = "",
retained_col = "blue",
excluded_col = "red",
control_col = "black",
line_width = 0.8,
line_alpha = 0.7,
ribbon_alpha = 0.3,
title_size = 20,
title_color = "black",
axis_text_size = 11,
boundary_col = "gray70",
exon_col = "navy",
legend_position = "bottom",
ylab = "Frequency"
)Arguments
- bed_file
Either a file path to a BED file or a data frame containing BED data with columns: chr, start, end, tag, score, strand
- SEMATS
A data frame containing SE.MATS output with columns: chr, strand, upstreamES, upstreamEE, exonStart_0base, exonEnd, downstreamES, downstreamEE, GeneID, PValue, FDR, IncLevelDifference, IJC_SAMPLE_1, SJC_SAMPLE_1, IJC_SAMPLE_2, SJC_SAMPLE_2, IncLevel1, IncLevel2
- moving_average
Integer specifying the window size for moving average smoothing. Set to NULL or 0 to disable smoothing. Default is 50.
- WidthIntoExon
Integer specifying how many bp to extend into exons. Default is 50.
- WidthIntoIntron
Integer specifying how many bp to extend into introns. Default is 300.
- p_valueRetainedAndExclusion
P-value threshold for retained/excluded events. Default is 0.05.
- p_valueControls
P-value threshold for control events. Default is 0.95.
- retained_IncLevelDifference
Inclusion level difference threshold for retained events. Default is 0.1.
- exclusion_IncLevelDifference
Inclusion level difference threshold for excluded events. Default is -0.1.
- Min_Count
Minimum read count threshold. Default is 50.
- groups
Character vector specifying which event groups to process. Options are "Retained", "Excluded", and/or "Control". Default is c("Retained", "Excluded", "Control") to process all groups.
- control_multiplier
Numeric multiplier for control sample size. The number of control events sampled per iteration is (n_retained + n_excluded) * control_multiplier. Default is 2.0.
- control_iterations
Integer number for sampling iterations for control sampling. The final control frequency is the mean across iterations, with standard deviation shown as a shaded band. Default is 20.
- z_threshold
Z-score threshold for significance testing. Default is 1.96 (corresponds to p < 0.025 one-sided, or p < 0.05 two-tailed). Only used when use_fdr = FALSE.
- min_consecutive
Minimum number of consecutive significant positions required to form a significant region. Default is 10. Helps reduce false positives from noise.
- one_sided
Logical. If TRUE (default), only test for enrichment (frequency > control). If FALSE, test for both enrichment and depletion.
- use_fdr
Logical. If TRUE (default), use FDR-corrected p-values (Benjamini-Hochberg) for significance testing. If FALSE, use z_threshold directly.
- fdr_threshold
FDR threshold for significance when use_fdr = TRUE. Default is 0.05.
- show_significance
Logical. If TRUE (default), displays colored bars above the plot indicating regions where Retained/Excluded differ significantly from Control.
- return_data
Logical. If TRUE, returns the frequency data frame instead of a plot. Default is FALSE.
- return_diagnostics
Logical. If TRUE, returns a list containing the frequency data, raw bootstrap iteration results (for normality testing), and significance results. Useful for validating bootstrap assumptions. Default is FALSE.
- verbose
Logical. If TRUE, prints progress messages. Default is TRUE.
- progress_callback
Optional function to report progress. Called with two arguments: current iteration number and total iterations. Default is NULL.
- title
Character string for the plot title. Default is "" (no title).
- retained_col
Color for the Retained group line. Default is "blue".
- excluded_col
Color for the Excluded group line. Default is "red".
- control_col
Color for the Control group line. Default is "black".
- line_width
Numeric line width for the frequency lines. Default is 0.8.
- line_alpha
Numeric alpha (opacity) for the frequency lines. Default is 0.7.
- ribbon_alpha
Numeric alpha for the SD ribbon around Control. Default is 0.3.
- title_size
Numeric font size for the plot title. Default is 20.
- title_color
Color for the plot title text. Default is "black".
- axis_text_size
Numeric font size for y-axis tick labels. Default is 11.
- boundary_col
Color for the dashed vertical boundary lines. Default is "gray70".
- exon_col
Fill color for the skipped (middle) exon in the schematic. Default is "navy".
- legend_position
Position of the legend. Default is "bottom".
- ylab
Label for the y-axis. Default is "Frequency".
Value
A ggplot object showing protein binding frequency across the 4 regions for Retained, Excluded, and Control groups. Significant regions (z-test vs Control) are shown as colored bars above the plot. Returns a data frame if return_data = TRUE.
Details
The function divides each splicing event into 4 regions of (WidthIntoExon + WidthIntoIntron) bp each:
Region 1 (UE-UI5): Upstream exon end to first intron
Region 2 (UI3-EX3): First intron end to middle (skipped) exon start
Region 3 (EX5-DI5): Middle exon end to second intron
Region 4 (DI3-DE): Second intron end to downstream exon start
Events are filtered into three groups:
Retained: Significant events (PValue < threshold) with negative IncLevelDifference
Excluded: Significant events (PValue < threshold) with positive IncLevelDifference
Control: Non-significant events with stable inclusion levels
Examples
if (FALSE) { # \dontrun{
# Load BED file and SE.MATS data
bed <- read.table("peaks.bed")
semats <- read.table("SE.MATS.JC.txt", header = TRUE)
# Basic usage
createSplicingMap(bed_file = bed, SEMATS = semats)
# Return data instead of plot
freq_data <- createSplicingMap(bed_file = bed,
SEMATS = semats,
return_data = TRUE)
} # }