Skip to contents

Analyzes protein binding frequency across retained intron junction regions. Uses a 2-region structure to show where protein binding sites appear relative to the upstream exon/intron and intron/downstream exon boundaries. Filters events into Retained, Excluded, and Control groups.

Usage

createRetainedIntronSplicingMap(
  bed_file,
  RIMATS,
  moving_average = 50,
  WidthIntoExon = 50,
  WidthIntoIntron = 300,
  p_valueRetainedAndExclusion = 0.05,
  p_valueControls = 0.95,
  retained_IncLevelDifference = 0.1,
  exclusion_IncLevelDifference = -0.1,
  Min_Count = 50,
  groups = c("Retained", "Excluded", "Control"),
  control_multiplier = 2,
  control_iterations = 20,
  z_threshold = 1.96,
  min_consecutive = 10,
  one_sided = TRUE,
  use_fdr = TRUE,
  fdr_threshold = 0.05,
  show_significance = TRUE,
  return_data = FALSE,
  return_diagnostics = FALSE,
  verbose = TRUE,
  progress_callback = NULL,
  title = "",
  retained_col = "blue",
  excluded_col = "red",
  control_col = "black",
  line_width = 0.8,
  line_alpha = 0.7,
  ribbon_alpha = 0.3,
  title_size = 20,
  title_color = "black",
  axis_text_size = 11,
  boundary_col = "gray70",
  exon_col = "navy",
  legend_position = "bottom",
  ylab = "Frequency"
)

Arguments

bed_file

Either a file path to a BED file or a data frame containing BED data with columns: chr, start, end, tag, score, strand

RIMATS

A data frame containing rMATS output with columns: chr, strand, upstreamES, upstreamEE, downstreamES, downstreamEE, GeneID, PValue, FDR, IncLevelDifference, IJC_SAMPLE_1, SJC_SAMPLE_1, IJC_SAMPLE_2, SJC_SAMPLE_2, IncLevel1, IncLevel2

moving_average

Integer specifying the window size for moving average smoothing. Set to NULL or 0 to disable smoothing. Default is 50.

WidthIntoExon

Integer specifying how many bp to extend into exons. Default is 50.

WidthIntoIntron

Integer specifying how many bp to extend into introns. Default is 300.

p_valueRetainedAndExclusion

P-value threshold for retained/excluded events. Default is 0.05.

p_valueControls

P-value threshold for control events. Default is 0.95.

retained_IncLevelDifference

Inclusion level difference threshold for retained events. Default is 0.1.

exclusion_IncLevelDifference

Inclusion level difference threshold for excluded events. Default is -0.1.

Min_Count

Minimum read count threshold. Default is 50.

groups

Character vector specifying which event groups to process. Options are "Retained", "Excluded", and/or "Control". Default is c("Retained", "Excluded", "Control") to process all groups.

control_multiplier

Numeric multiplier for control sample size. The number of control events sampled per iteration is (n_retained + n_excluded) * control_multiplier. Default is 2.0.

control_iterations

Integer number for sampling iterations for control sampling. The final control frequency is the mean across iterations, with standard deviation shown as a shaded band. Default is 20.

z_threshold

Z-score threshold for significance testing. Default is 1.96. Only used when use_fdr = FALSE.

min_consecutive

Minimum number of consecutive significant positions required to form a significant region. Default is 10.

one_sided

Logical. If TRUE (default), only test for enrichment.

use_fdr

Logical. If TRUE, use FDR-corrected p-values. Default is TRUE.

fdr_threshold

FDR threshold for significance when use_fdr = TRUE. Default is 0.05.

show_significance

Logical. If TRUE (default), displays colored bars above the plot indicating regions where Retained/Excluded differ significantly from Control based on z-test.

return_data

Logical. If TRUE, returns the frequency data frame instead of a plot. Default is FALSE.

return_diagnostics

Logical. If TRUE, returns a list containing the frequency data and bootstrap diagnostics. Default is FALSE.

verbose

Logical. If TRUE, prints progress messages. Default is TRUE.

progress_callback

Optional function to report progress. Default is NULL.

title

Character string for the plot title. Default is "".

retained_col

Color for the Retained group line. Default is "blue".

excluded_col

Color for the Excluded group line. Default is "red".

control_col

Color for the Control group line. Default is "black".

line_width

Numeric line width for the frequency lines. Default is 0.8.

line_alpha

Numeric alpha for the frequency lines. Default is 0.7.

ribbon_alpha

Numeric alpha for the SD ribbon around Control. Default is 0.3.

title_size

Numeric font size for the plot title. Default is 20.

title_color

Color for the plot title text. Default is "black".

axis_text_size

Numeric font size for y-axis tick labels. Default is 11.

boundary_col

Color for the dashed vertical boundary lines. Default is "gray70".

exon_col

Unused parameter kept for API consistency. Default is "navy".

legend_position

Position of the legend. Default is "bottom".

ylab

Label for the y-axis. Default is "Frequency".

Value

A ggplot object showing protein binding frequency across the 2 regions for Retained, Excluded, and Control groups. The bottom schematic shows two exon boxes connected by a single intron line. Returns a data frame if return_data = TRUE.

Details

The function divides each retained intron event into 2 regions of (WidthIntoExon + WidthIntoIntron) bp each:

  • Region 1 (UE-RI5): Upstream exon end to retained intron

  • Region 2 (RI3-DE): Retained intron end to downstream exon start

Events are filtered into three groups:

  • Retained: Significant events (PValue < threshold) with negative IncLevelDifference

  • Excluded: Significant events (PValue < threshold) with positive IncLevelDifference

  • Control: Non-significant events with stable inclusion levels

Examples

if (FALSE) { # \dontrun{
# Load BED file and RI.MATS data
bed <- checkBed("peaks.bed")
rimats <- read.table("RI.MATS.JC.txt", header = TRUE)

# Basic usage
createRetainedIntronSplicingMap(bed_file = bed, RIMATS = rimats)

# Return data instead of plot
freq_data <- createRetainedIntronSplicingMap(bed_file = bed, RIMATS = rimats,
                                      return_data = TRUE)
} # }