Analyzes the frequency of a target sequence motif across retained intron junction regions. Compares motif frequency between Retained, Excluded, and Control events to identify position-specific enrichment patterns around the upstream exon/intron and intron/downstream exon boundaries.
Usage
createRetainedIntronSequenceMap(
RIMATS,
sequence,
genome = NULL,
moving_average = 40,
WidthIntoExon = 50,
WidthIntoIntron = 250,
p_valueRetainedAndExclusion = 0.05,
p_valueControls = 0.95,
retained_IncLevelDifference = 0.1,
exclusion_IncLevelDifference = -0.1,
Min_Count = 50,
groups = c("Retained", "Excluded", "Control"),
control_multiplier = 2,
control_iterations = 20,
z_threshold = 1.96,
min_consecutive = 10,
one_sided = TRUE,
use_fdr = TRUE,
fdr_threshold = 0.05,
show_significance = TRUE,
return_data = FALSE,
return_diagnostics = FALSE,
verbose = TRUE,
progress_callback = NULL,
title = "",
retained_col = "blue",
excluded_col = "red",
control_col = "black",
line_width = 0.8,
line_alpha = 0.7,
ribbon_alpha = 0.3,
title_size = 20,
title_color = "black",
axis_text_size = 11,
boundary_col = "gray70",
exon_col = "navy",
legend_position = "bottom",
ylab = "Frequency"
)Arguments
- RIMATS
A data frame containing rMATS output with columns: chr, strand, upstreamES, upstreamEE, downstreamES, downstreamEE, GeneID, PValue, FDR, IncLevelDifference, IJC_SAMPLE_1, SJC_SAMPLE_1, IJC_SAMPLE_2, SJC_SAMPLE_2, IncLevel1, IncLevel2
- sequence
Character string of the target sequence motif to search for (e.g., "CCCC", "YGCY"). Supports IUPAC ambiguity codes.
- genome
A BSgenome object. Default uses BSgenome.Hsapiens.UCSC.hg38.
- moving_average
Integer specifying the window size for moving average smoothing. Set to NULL or 0 to disable smoothing. Default is 40.
- WidthIntoExon
Integer specifying how many bp to extend into exons. Default is 50.
- WidthIntoIntron
Integer specifying how many bp to extend into introns. Default is 250.
- p_valueRetainedAndExclusion
P-value threshold for retained/excluded events. Default is 0.05.
- p_valueControls
P-value threshold for control events. Default is 0.95.
- retained_IncLevelDifference
Inclusion level difference threshold for retained events. Default is 0.1.
- exclusion_IncLevelDifference
Inclusion level difference threshold for excluded events. Default is -0.1.
- Min_Count
Minimum read count threshold. Default is 50.
- groups
Character vector specifying which event groups to process. Options are "Retained", "Excluded", and/or "Control". Default is c("Retained", "Excluded", "Control") to process all groups.
- control_multiplier
Numeric multiplier for control sample size. Default is 2.0.
- control_iterations
Integer number for sampling iterations for control sampling. Default is 20.
- z_threshold
Z-score threshold for significance testing. Default is 1.96. Only used when use_fdr = FALSE.
- min_consecutive
Minimum number of consecutive significant positions required to form a significant region. Default is 10.
- one_sided
Logical. If TRUE (default), only test for enrichment.
- use_fdr
Logical. If TRUE, use FDR-corrected p-values. Default is TRUE.
- fdr_threshold
FDR threshold for significance when use_fdr = TRUE. Default is 0.05.
- show_significance
Logical. If TRUE (default), displays colored bars above the plot indicating significant regions.
- return_data
Logical. If TRUE, returns the frequency data frame instead of a plot. Default is FALSE.
- return_diagnostics
Logical. If TRUE, returns a list containing the frequency data and bootstrap diagnostics. Default is FALSE.
- verbose
Logical. If TRUE, prints progress messages. Default is TRUE.
- progress_callback
Optional function to report progress. Default is NULL.
- title
Character string for the plot title. Default is "".
- retained_col
Color for the Retained group line. Default is "blue".
- excluded_col
Color for the Excluded group line. Default is "red".
- control_col
Color for the Control group line. Default is "black".
- line_width
Numeric line width for the frequency lines. Default is 0.8.
- line_alpha
Numeric alpha for the frequency lines. Default is 0.7.
- ribbon_alpha
Numeric alpha for the SD ribbon around Control. Default is 0.3.
- title_size
Numeric font size for the plot title. Default is 20.
- title_color
Color for the plot title text. Default is "black".
- axis_text_size
Numeric font size for y-axis tick labels. Default is 11.
- boundary_col
Color for the dashed vertical boundary lines. Default is "gray70".
- exon_col
Unused parameter kept for API consistency. Default is "navy".
- legend_position
Position of the legend. Default is "bottom".
- ylab
Label for the y-axis. Default is "Frequency".
Value
A ggplot object showing sequence motif frequency across the 2 regions for Retained, Excluded, and Control groups. The bottom schematic shows two exon boxes connected by a single intron line. Returns a data frame if return_data = TRUE.
Details
The function divides each retained intron event into 2 regions of (WidthIntoExon + WidthIntoIntron) bp each:
Region 1 (UE-RI5): Upstream exon end to retained intron
Region 2 (RI3-DE): Retained intron end to downstream exon start
At each position, the function checks if the target sequence starts there. The frequency is calculated as: (events with motif at position) / (total events)
Examples
if (FALSE) { # \dontrun{
library(BSgenome.Hsapiens.UCSC.hg38)
rimats <- read.table("RI.MATS.JC.txt", header = TRUE)
# Basic usage
createRetainedIntronSequenceMap(RIMATS = rimats, sequence = "CCCC")
# Search for YCAY motif (Y = C or T)
createRetainedIntronSequenceMap(RIMATS = rimats, sequence = "YCAY")
# Return data instead of plot
freq_data <- createRetainedIntronSequenceMap(RIMATS = rimats,
sequence = "GGGG",
return_data = TRUE)
} # }