The epi2me2r package includes fully automated methods to take raw CSVs of CARD ARMA and WIMP output from the EPI2ME pipeline from Oxford Nanopore and quickly convert it into common R package formats, namely phyloseq and metagenomeSeq.
There are three main types of functions in epi2me2r:
- 
fully automated: these functions take minimal input (just raw CSV files and metadata) and produce either 
phyloseqormetagenomeSeqobjects for downstream analysis: - step-by-step: If you need output from an intermediate step, you can these functions to generate only what you need:
 - 
other: An additional function we created but does not fit into the main workflows is 
amr_read_taxonomy(). This function reads in both AMR and WIMP raw data and adds the taxonomic information to the AMR gene if available. 
Preparation of metadata file
Prior to starting, making sure the metadata file is formatted appropriately will ensure your data is imported correctly. You can use one combined metadata file for both your AMR and WIMP samples or a separate file for each. Both options are described below.
Combo metadata file
This file has 4 required columns that must be named as follows:
- 
arma_filename: the original amr file name without the.csvextension - 
arma_barcode: the barcodes of each sample (note: if you did not barcode any of your samples, enternonein all of the cells). In the AMR workflow, missing barcodes are coded asnone - 
wimp_filename: the original amr file name without the.csvextension - 
wimp_barcode: the barcodes of each sample (note: if you did not barcode any of your samples, enterNAin all of the cells). In the WIMP workflow, missing barcodes are coded asNA - 
additional informationafter these four required columns, you may include any additional metadata that is important, such as treatment type, sample numbers, etc. 
An example of a combo metadata file is included with this package.
epi2me.metadata <- read.csv(system.file("extdata", "example_metadata.csv", package = "epi2me2r"))
head(epi2me.metadata)##   arma_filename wimp_filename arma_barcode wimp_barcode Samplename  treatment
## 1   arma_288715   324212_1777    barcode02    barcode02    Animal1    control
## 2   arma_288715   324212_1777    barcode03    barcode03    Animal2 antibiotic
## 3   arma_288715   324212_1777    barcode04    barcode04    Animal3    control
## 4   arma_288716   324212_1778         none         <NA>    Animal4 antibiotic
Individual metadata file
If you are just importing WIMP or CARD ARMA files, you do not need all the metadata associated with the other workflow.
If you are just processing ARMA CARD data, the required columns are: * arma_filename * arma_barcode * other metadata such as treatment and sample names
On the other hand, if you are just processing WIMP data the required columns are: * wimp_filename * wimp_barcode * other metadata such as treatment and sample names
Even if you are just processing one type of data, both ARMA and WIMP information can be included in the metadata (as seen in the section on combo metadata above).
Fully automated data import
For both AMR and WIMP data, the raw CSVs downloaded from the epi2me website need to be in their own directory (without any other files). Note that if you are processing both WIMP and ARMA data you will need two directories, one for each set of data.
AMR data
amr_raw_to_phyloseq
Reading the AMR data requires a directory and a metadata file. The directory should have only the CSV files generated by EPI2ME in it. An example of the metadata file is above. The data we will be using is from an example run on the EPI2ME pipeline. There are four options:
- 
path.to.amr.filesrequired: the path to the raw CSV files (for example"Desktop/raw_data/") - 
metadatarequired: metadata formatted as described above as adata.frame - 
coveragenumberoptional : the total length of the gene that must be present for it to be included in the count table. The default is 80%; this argument takes any number from 1 to 99. Default is80. - 
keepSNPoptional : whether to include genes that are considered resistance genes only with a SNP mutation. Default isFALSE(does not include these genes) but can be changed toTRUEto include these genes. 
In the following code example, we use the amr_raw_to_phyloseq() function and the included example metadata file read in above, as well as a directory containing example AMR files also included with the epi2me2r package. This code creates a phyloSeq object from the example AMR files and metadata.
example.amr.dir <- system.file("extdata", "example_amr_data", package = "epi2me2r")
ps.amr.object <- amr_raw_to_phyloseq(path.to.amr.files = example.amr.dir,
                                     metadata = epi2me.metadata,
                                     coveragenumber = 80, 
                                     keepSNP = FALSE)## Reading in raw AMR files from /home/runner/work/_temp/Library/epi2me2r/extdata/example_amr_data
amr_raw_to_metagenomeseq
The amr_raw_to_metagenomeseq() function uses the same arguments as above for importing to metagenomeSeq:
mgs.amr.object <- amr_raw_to_metagenomeseq(path.to.amr.files = example.amr.dir,
                                           metadata = epi2me.metadata,
                                           coveragenumber = 80, 
                                           keepSNP = FALSE)## Reading in raw AMR files from /home/runner/work/_temp/Library/epi2me2r/extdata/example_amr_data
mgs.amr.object## MRexperiment (storageMode: environment)
## assayData: 67 features, 4 samples 
##   element names: counts 
## protocolData: none
## phenoData
##   sampleNames: arma_288715_barcode02 arma_288715_barcode03
##     arma_288715_barcode04 arma_288716_none
##   varLabels: arma_filename wimp_filename ... sampleID (7 total)
##   varMetadata: labelDescription
## featureData
##   featureNames: sp1 sp2 ... sp67 (67 total)
##   fvarLabels: OTUname CVTERMID ... ARO.Name (6 total)
##   fvarMetadata: labelDescription
## experimentData: use 'experimentData(object)'
## Annotation:
WIMP
wimp_raw_to_phyloseq
WIMP files are similar to the AMR files but use the package taxonomizr to add taxonomic hierarchical information.
Reading in the WIMP data requires a directory and a metadata file. The directory should have only the CSV files generated by EPI2ME in it. An example of the metadata file is above. The data we will be using is from an example run on the EPI2ME pipeline. There are four options:
- 
path.to.wimp.filesrequired: the path to the raw CSV files (for example"Desktop/raw_data/") - 
metadatarequired: metadata formatted as described above as adata.frame - 
keep.unclassifiedoptional : whether to keep genes that do not classify or do not classify beyond a superkingdom. Default isFALSE(does not include these reads) but can be changed toTRUEto include these reads - 
keep.humanoptional : whether to keep reads associated with Homo sapiens (usually considered a contaminant in microbiome data). Default isFALSE(does not include human-associated reads) but can be changed toTRUEto include these reads. 
The following code uses the wimp_raw_to_phyloseq() function and the example metadata we read in above as well as a directory of example WIMP files included with the package to convert the raw WIMP files to a phyloSeq object:
example.wimp.dir <- system.file("extdata", "example_wimp_data", package = "epi2me2r")
ps.wimp.object <- wimp_raw_to_phyloseq(path.to.wimp.files = example.wimp.dir,
                                       metadata = epi2me.metadata,
                                       keep.unclassified = FALSE, 
                                       keep.human = FALSE)wimp_raw_to_metagenomeseq
Like the functions for AMR, the wimp_raw_to_metagenomeSeq() function uses the same arguments for importing to metagenomeSeq are used as were used above in the wimp_raw_to_phyloseq() function:
mgs.wimp.object <- wimp_raw_to_metagenomeseq(path.to.wimp.files = example.wimp.dir,
                                             metadata = epi2me.metadata,
                                             keep.unclassified = FALSE, 
                                             keep.human = FALSE)Step-by-step import
In some cases you might not want a phyloseq or metagenomeSeq object, but instead just a count matrix or taxonomic list. In these cases you can use the below functions.
AMR data
read_in_amr_file
This takes the directory that the AMR CSV files are in and creates a count matrix that can be used in downstream analysis. The inputs are similar to those in the previous examples (but metadata is not required):
- 
path.to.amr.filesrequired: the path to the raw CSV files (for example"Desktop/raw_data/") - 
coveragenumberoptional : the total length of the gene that must be present for it to be included in the count table. The default is 80%; this argument takes any number from 1 to 99. Default is80. - 
keepSNPoptional : whether to include genes that are considered resistance genes only with a SNP mutation. Default isFALSE(does not include these genes) but can be changed toTRUEto include these genes. 
amr.count.table <- read_in_amr_files(path.to.amr.files = example.amr.dir,
                                     coveragenumber = 80, 
                                     keepSNP = FALSE)## Reading in raw AMR files from /home/runner/work/_temp/Library/epi2me2r/extdata/example_amr_data
head(amr.count.table)##    CVTERMID arma_288715_barcode02 arma_288715_barcode03 arma_288715_barcode04
## 1:    36033                     5                     1                     1
## 2:    36036                     0                     0                     1
## 3:    36213                     0                     1                     1
## 4:    36355                     3                     2                     3
## 5:    36376                     0                     0                     0
## 6:    36448                     2                     0                     0
##    arma_288716_none
## 1:                3
## 2:                0
## 3:                1
## 4:                1
## 5:                1
## 6:                1
generate_amr_taxonomy
This function assigns AMR taxonomic hierarchical information from CARD using a count table with CV TERM ID’s as the first column ("CVTERMID"). Only one input is needed:
- 
amr.count.tablerequired: data frame of generated withamr.count.tableor that hasCVTERMIDas the first column for AMR taxonomic assignment - 
verboseoptional : only a subset of column names are included in the output by default: (CVTERMID,Drug Class,AMR Gene Family,Resistance Mechanism, andARO Name). Ifverbose==TRUE13 columns are returned. 
amr.taxonomy <- generate_amr_taxonomy(amr.count.table = amr.count.table,
                                         verbose = FALSE)
head(amr.taxonomy)##    CVTERMID
## 1:    36033
## 2:    36036
## 3:    36213
## 4:    36355
## 5:    36376
## 6:    36448
##                                                                                                                                                                                                                                                        Drug Class
## 1:                                                                                                                                                                                                                                     fluoroquinolone antibiotic
## 2:                                                                                                                                                                                                                                     fluoroquinolone antibiotic
## 3:                                                                                                                                                                                                                                     fluoroquinolone antibiotic
## 4:                                                                                                                        cephalosporin;fluoroquinolone antibiotic;glycylcycline;penam;phenicol antibiotic;rifamycin antibiotic;tetracycline antibiotic;triclosan
## 5: aminocoumarin antibiotic;aminoglycoside antibiotic;carbapenem;cephalosporin;cephamycin;fluoroquinolone antibiotic;glycylcycline;macrolide antibiotic;penam;penem;peptide antibiotic;phenicol antibiotic;rifamycin antibiotic;tetracycline antibiotic;triclosan
## 6:                                                                                                                                                                                                                                                            N/A
##                                                                                                                                                                  AMR Gene Family
## 1:                                                                                                                             ATP-binding cassette (ABC) antibiotic efflux pump
## 2:                                                                                                                    major facilitator superfamily (MFS) antibiotic efflux pump
## 3:                                                                                                                    major facilitator superfamily (MFS) antibiotic efflux pump
## 4:                                                                                                              resistance-nodulation-cell division (RND) antibiotic efflux pump
## 5: ATP-binding cassette (ABC) antibiotic efflux pump;major facilitator superfamily (MFS) antibiotic efflux pump;resistance-nodulation-cell division (RND) antibiotic efflux pump
## 6:                                                                                                                    major facilitator superfamily (MFS) antibiotic efflux pump
##    Resistance Mechanism ARO Name
## 1:    antibiotic efflux     patA
## 2:    antibiotic efflux     emrA
## 3:    antibiotic efflux     emrB
## 4:    antibiotic efflux     acrB
## 5:    antibiotic efflux     TolC
## 6:    antibiotic efflux     emrD
WIMP data
read_in_wimp_file
This takes the directory the WIMP CSV files are in and creates a count matrix that can be used in downstream analysis. The inputs are similar to those in the previous examples (but metadata is not required):
- 
path.to.wimp.filesrequired: the path to the raw CSV files (for example"Desktop/raw_data/") 
example.wimp.dir <- system.file("extdata", "example_wimp_data", package = "epi2me2r")
wimp.count.table <- read_in_wimp_files(path.to.wimp.files = example.wimp.dir)## Reading in raw WIMP files from /home/runner/work/_temp/Library/epi2me2r/extdata/example_wimp_data
## The percentage of classified reads was 94.4
head(wimp.count.table)##    taxID 324212_1777_barcode02 324212_1777_barcode03 324212_1777_barcode04
## 1:     0                     9                     6                     6
## 2:   286                     3                     0                     0
## 3:   287                     8                     9                    11
## 4:   543                     2                     1                     2
## 5:   544                     0                     0                     1
## 6:   561                     0                     0                     0
##    324212_1778_NA
## 1:              7
## 2:              1
## 3:             16
## 4:              0
## 5:              0
## 6:              1
generate_wimp_taxonomy
This function assigns phylogenetic taxonomic hierarchical information with the help of taxonomizr. A count table with NCBI taxonomic ID’s ("taxID") as the first column is required.
- 
wimp.count.tablerequired: data frame generated withwimp.count.tableor that has"taxID"as the first column for phylogenetic taxonomic assignment 
wimp.taxonomy <- generate_wimp_taxonomy(wimp.count.table = wimp.count.table)Other functions
Another useful function is amr_read_taxonomy, which matches any classified AMR read with the phylogenetic taxonomy (if it is assigned) using read_id(). This function takes the following arguments:
- 
path.to.amr.filesrequired: the path to the raw AMR CSV files (for example"Desktop/raw_amr_data/") - 
path.to.wimp.filesrequired: the path to the raw WIMP CSV files (for example"Desktop/raw_wimp_data/") - 
coveragenumberoptional : the total length of the gene that must be present for it to be included in the count table. The default is 80%; this argument takes any number from 1 to 99. Default is80. 
amr.read.classification <- amr_read_taxonomy(path.to.amr.files = example.amr.dir,
                                             path.to.wimp.files = example.wimp.dir)