--- title: "Data analysis of GC-MS and LC-MS in Environmental Science" author: "Miao Yu" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{enviGCMS} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- Qualitative and quantitative analysis of contaminants are the core of the Environmental Science. GC/LC-MS might be one of the most popular instruments for such analytical methods. Previous works such as `xcms` were developed for GC-MS data. However, such packages have limited functions for environmental analysis. In this package. I added functions for various GC/LC-MS data analysis purposes used in environmental analysis. Such feature could not only reveal certain problems, but also help the user find out the unknown patterns in the dataset of GC/LC-MS. ## Data analysis for single sample ### Data structure of GC/LC-MS full scan mode GC/LC is used for separation and MS is used for detection in a GC/LC-MS system. The collected data are intensities of certain mass at different retention time. When we perform analysis on certain column in full scan mode, the counts of different mass were collected in each scan. The dwell time for each scan might only last for 500ms or less. Then the next scan begins with a different retention time. Here we could use a matrix to stand for those data. Each column stands for each mass and row stands for the retention time of that scan. Such matrix could be treated as time series data. In this package, we treat such data as `matrix` type. For high-resolution MS, building such matrix is tricky. We might need to bin the RAW data to make alignment for different scans into a matrix. Such works could be done by `xcms`. ### Data structure of GC-MS SIM mode When you perform a selected ions monitor(SIM) mode analysis, only few mass data were collected and each mass would have counts and retention time as a time series data. In this package, we treat such data as `data.frame` type. ### Data input You could use `getmd` to import the mass spectrum data as supported by `xcms` and get the profile of GC-MS data matrix. `mzstep` is the bin step for mass: ``` data <- enviGCMS:::getmd('data/data1.CDF', mzstep = 0.1) ``` You could also subset the data by the mass(m/z 100-1000) or retention time range(40-100s) in `getmd` function: ``` data <- enviGCMS:::getmd(data,mzrange=c(100,1000),rtrange=c(40,100)) ``` You could also combined the mass full-scan data with the same range of retention time by `cbmd`: ``` data <- cbmd(data1,data2,data3) ``` ### Visualization of mass spectrum data You could plot the Total Ion Chromatogram(TIC) for certain RT and mass range. ``` plottic(data,rt=c(3.1,25),ms=c(100,1000)) ``` You could also plot the mass spectrum for certain RT range. You could use the returned MSP files for NIST search: ``` plotrtms(data,rt=c(3.1,25),ms=c(100,1000)) ``` The Extracted Ion Chromatogram(EIC) is also support by `enviGCMS` and the returned data could be analyzed for molecular isotopes: ``` plotmsrt(data,ms=500,rt=c(3.1,25)) ``` You could use `plotms` or `plotmz` to show the heatmap or scatter plot for LC/GC-MS data, which is very useful for exploratory data analysis. ``` plotms(data) plotmz(data) ``` You could change the retention time into the temperature if it is a constant speed of temperature rising process. But you need show the temperature range. ``` plott(data,temp = c(100,320)) ``` ### Data analysis for influence from GC-MS `enviGCMS` supplied many functions for decreasing the noise during the analysis process. `findline` could be used for find line of the boundary regression model for noise. `comparems` could be used to make a point-to-point data subtraction of two full-scan mass spectrum data. `plotgroup` could be used convert the data matrix into a 0-1 heatmap according to threshold. `plotsub` could be used to show the self background subtraction of full-scan data. `plotsms` shows the RSD of the intensity of full scan data. `plothist` could be used to find the data distribution of the histogram of the intensities of full scan data. ### Data analysis for molecular isotope ratio Some functions could be used to calculate the molecular isotope ratio. EIC data could be import into `GetIntergration` and return the information of found peaks. `Getisotoplogues` could be used to calculate the molecular isotope ratio of certain molecular. Some shortcut function such as `batch` and `qbatch` could be used to calculate molecular isotope ratio for multiple and single molecular in EIC data. ### Quantitative analysis for short-chain chlorinated paraffins(SCCPs) `enviGCMS` supply function to perform Quantitative analysis for short-chain chlorinated paraffins(SCCPs) with Q-tof data. Use `getsccp` to make Quantitative analysis for SCCPs. If you want a graphical user interface for SCCPs analysis, a shiny application is developed in this package. You could use `runsccp()` to power on the application in a browser. ## Data analysis for multiple samples In environmental non-target analysis, when multiple samples are collected, problem will raise from the heterogeneity among samples. For example, retention time would shift due to the column. In those cases, `xcms` package could be used to get a peaks list across samples within certain retention time and m/z. `enviGCMS` package has some wrapped function to get the peaks list. Besides, some specific functions such as group comparison, batch correction and visualization are also included. ### Wrap function for `xcms` package - `getdata` could be used to get the `xcmsSet` object in one step with optimized methods - `getdata2` could be used to get the `XCMSnExp` object in one step with optimized methods - `getmzrt` could get a list as `mzrt` object with peaks list, mz, retention time and class of samples from `xcmsSet`/`XCMSnExp` object. You could also save related `xcmsSet` and `xcmsEIC` object for further analysis. It also support to output the file for metaboanalyst, xMSannotator, Mummichog pathway analysis and paired mass distance(PMD) analysis. - `getmzrtcsv` could read in the csv files and return a list for peaks list as `mzrt` object ### Data imputation and filtering - `getimputation` could impute NA in the peaks list. - `getfilter` could filter the data based on row and column index. - `getdoe` could filter the data based on rsd and intensity and generate group mean, standard deviation, and group rsd. - `getpower` could compute the power for known peaks list or the sample numbers for each peak - `getoverlappeak` could get the overlap peaks by mass and retention time range for two different list objects ### Visualization of peaks list data - `plotmr` could plot the scatter plot for peaks list with threshold - `plotmrc` could plot the differences as scatter plot for peaks list with threshold between two group of data - `plotrsd` could plot the rsd influences of data in different groups - `gifmr` could plot scatter plot for peaks list and output gif file for multiple groups - `plotpca` could plot pca score plot - `plothm` could plot heatmap - `plotden` could plot density of multiple samples - `plotrla` could plot Relative Log Abundance (RLA) plots - `plotridges` could plot Relative Log Abundance Ridge (RLAR) plots ## Summary In general, `enviGCMS` could be used to explore single data or peaks list from GC/LC-MS and extract certain patterns in the data with various purposes.