Readers
Classes related to reading genomic files
Sam reader
-
class SamReader
SAM/BAM file reader.
Uses htslib to read SAM/BAM header and entries. This class only reads the SAM header and entries, it uses SamEntry to parse them.
See also
Public Functions
-
SamReader(const string &in_file)
Opens a SAM/BAM file, verifies that it is a sequence data file, requires a header, and initializes htslib handlers.
- Parameters:
in_file – [in] SAM/BAM file name.
- Throws:
std::runtime_error – if the file cannot be opened, is not a sequence data file, has no header, or htslib initialization fails.
-
~SamReader()
Closes the file and destroys htslib handlers.
-
bool read_sam_line(SamEntry &entry)
Reads one entry and populates ‘entry’ with the help of SamEntry.
- Parameters:
entry – [out] SamEntry to populate with the parsed fields. Contents are not valid if false is returned.
- Throws:
std::runtime_error – if the entry cannot be parsed.
- Returns:
True on successfully reading an entry. False if end of file is reached.
-
bool read_pe_sam(SamEntry &entry1, SamEntry &entry2)
Reads a pair of entries from a paired-end SAM/BAM file, and populates ‘entry1’ and ‘entry2’ in the order the pair appears in the SAM file. This uses a hash table to keep track of SAM entries whose mates have not yet been read. SAM entries are read using ‘read_sam_line’, and checks if another entry with the same QNAME exists in the hash table. If it exists, the current read entry and the entry in the hash table are returned. If it does not exist, the current entry is stored in the hash table. It is assumed that each read pair contains exactly two entries in the SAM file. Having more than two entries present for a read pair (for example, supplementary alignments) will result in returning more than one read pair for the same QNAME (if there are even entries) or having dangling entries in the hash table (if there are odd entries).
-
void read_sam_header(string &hdr)
Returns the entire SAM header as a string, including all header lines (HD, SQ, RG, PG, etc.).
- Parameters:
hdr – [out] String to hold the header.
-
SamReader(const string &in_file)
-
class SamEntry
A class for holding values in a SAM entry.
Takes an entry read from a SAM/BAM file as a string and parses the mandatory and optional fields.
Public Functions
-
inline SamEntry()
Default constructor. Initializes SAM fields to default values.
-
SamEntry(const string &line)
Constructs from a SAM/BAM string by calling parse_entry.
- Parameters:
line – [in] SAM/BAM line to parse.
-
~SamEntry()
Default destructor.
-
void parse_entry(const string &line)
Parses a string into SAM/BAM fields. The required fields are stored in the appropriate member. The optional tags are stored as a TAG:TYPE:VALUE string in a vector.
- Parameters:
line – [in] SAM/BAM line.
Public Members
-
string qname
QNAME. Query template name.
-
uint16_t flag
FLAG. Bitwise flag. See SamFlags for interpretation.
-
string rname
RNAME. Reference sequence name.
-
uint32_t pos
POS. 1-based leftmost mapping position.
-
uint16_t mapq
MAPQ. Mapping quality. 255 indicates not available.
-
string cigar
CIGAR. CIGAR string.
-
string rnext
RNEXT. Reference name of mate.
-
uint32_t pnext
PNEXT. Position of mate.
-
int tlen
TLEN. Observed template length.
-
string seq
SEQ. Segment sequence.
-
string qual
QUAL. ASCII of Phred-scaled base quality.
-
vector<string> tags
< TAGS. Optional tags.
-
inline SamEntry()
Fastq reader
-
class FastqReader
FASTQ file reader.
A class to read and parse single and paired end FASTQ files
Public Functions
-
FastqReader(const std::string &in_file)
Opens a single-end FASTQ file. Throws a runtime error if the file cannot be opened.
- Parameters:
in_file – [in] FASTQ file name
-
FastqReader(const std::string &in_file_1, const std::string &in_file_2)
Opens a paired-end FASTQ file. Throws a runtime error if either file cannot be opened.
- Parameters:
in_file_1 – [in] first FASTQ file name
in_file_2 – [in] second FASTQ file name
-
~FastqReader()
Closes any open files
-
bool read_se_entry(FastqEntry &e)
Read an entry from a single-end FASTQ file and populates a
See also
FastqEntry.
- Parameters:
e – [out] FastqEntry to populate with the FASTQ entry
- Returns:
True on successfully reading a FASTQ entry. Flase if end of file is reached.
-
bool read_pe_entry(FastqEntry &e1, FastqEntry &e2)
Read a pair of entries from paired-end FASTQ files and populates two
See also
FastqEntry.
- Parameters:
e1 – [out] FastqEntry to populate with the first FASTQ entry
e2 – [out] FastqEntry to populate with the second FASTQ entry
- Returns:
True on successfully reading a FASTQ entry. Flase if end of file is reached.
-
FastqReader(const std::string &in_file)
BED reader
-
class BedReader
BED file reader.
A class to read and parse BED files. Supports reading BED3 entries (chrom, start, end) and variable-column BED entries where additional columns are returned as a vector of strings.
Public Functions
-
BedReader(const string &in_file)
Opens a BED file. Throws a runtime error if the file cannot be opened.
- Parameters:
in_file – [in] BED file name.
-
~BedReader()
Closes the BED file.
-
bool read_bed3_line(GenomicRegion &g)
Reads the next BED3 (chrom, start, end) line from the file and populates a GenomicRegion.
- Parameters:
g – [out] GenomicRegion to populate.
- Returns:
True on successfully reading a BED3 entry. False if end of file is reached.
-
bool read_bed_line(GenomicRegion &g, std::vector<std::string> &fields)
Reads the next BED line and populates a GenomicRegion with the first 3 columns (chrom, start, end). Any remaining columns are returned in a vector of strings. If the line has only 3 columns, the fields vector will be empty.
- Parameters:
g – [out] GenomicRegion to populate with chrom, start, end.
fields – [out] Vector of strings containing any additional columns beyond the first three.
- Returns:
True on successfully reading a BED entry. False if end of file is reached.
-
void read_bed3_file(std::vector<GenomicRegion> &g)
Reads the entire BED file and populates a vector of GenomicRegion with the BED3 (chrom, start, end) fields.
- Parameters:
g – [out] Vector of GenomicRegions to populate.
-
BedReader(const string &in_file)
GTF reader
-
class GtfReader
GTF reader.
A class for reading and parsing GTF files.
Public Functions
-
GtfReader(const string &in_file)
Opens a GTF file. Throws a runtime error if the file cannot be opened.
- Parameters:
in_file – [in] GTF file name
-
~GtfReader()
Closes the GTF file.
-
bool read_gtf_line(GtfEntry &g)
- ` * @params [out] g GtfEntry to populate.
Reads a line from a GTF file, parses it, and populates a @see GtfEntry.
- Returns:
true on successfully reading a gtf entry, and false if end of file is reached.
-
void read_gtf_file(vector<GtfEntry> &g)
Reads a GTF file, parses each line, and populates a vector of
See also
GtfEntry, Note that read_gtf_file and read_gtf_line uses the same file handler, and so if this is called after calls to read_gtf_line, then it will read from the next line to the end of the file.
See also
GtfEntry to populate.
- Parameters:
g – [out] a vector of
-
bool read_gencode_gtf_line(GencodeGtfEntry &g)
-
void read_gencode_gtf_file(vector<GencodeGtfEntry> &g)
-
GtfReader(const string &in_file)