Background
Cancer is one of the leading causes of death worldwide. Abnormal genetic alterations followed by the uncontrolled growth of somatic cells initiate cancer. Although most genetic alterations are passenger mutations that do not contribute to tumorigenesis, an individual cell can proliferate and become a tumor if it acquires a sufficient set of driving mutations. Therefore, finding cancer-driving mutations and targeting the encoded abnormal proteins and related pathways via cancer therapeutics are important strategies to delay cancer progression and prevent metastasis [
1].
Previous studies, led by The Cancer Genome Atlas (TCGA) and International Cancer Genome Consortium (ICGC), have identified cancer-driving mutations via large-scale analyses [
2]. Although large-scale analyses unveiled frequently altered driving mutations in many cancer types, such as
BRAF (V600E) in melanoma and colorectal cancer, finding less frequently altered mutations is a challenge using large-scale analyses, especially in uncommon cancer types [
2‐
4].
Adolescent and young adult (AYA) cancer is a rare type of malignant disease that arises in patients aged 15 to 39 years and is characterized by biological features, therapeutic outcomes, and survival rates that are distinct from those observed in other age groups. Although determining the genomic profiles of AYA cancer is important to investigate the causes of these distinct characteristics, large-scale genomic studies or molecular data for AYA cancer are not available due to the rarity of the disease and the difficulty of collecting tumor samples [
5,
6].
In this study, we analyzed seven different AYA cancers from patients with metastatic tumors using three different genomics platforms (whole-exome sequencing, whole-transcriptome sequencing, and OncoScan™). We identified single nucleotide variations (SNVs) and insertion and deletions (indels) by using whole-exome sequencing (WES) and detected fusions by using whole transcriptome sequencing (WTS). For copy number variations (CNVs), we used OncoScan™ that is the genomics platform for analysis of copy number variations which had high performance with samples from FFPE, especially [
7]. We processed the WES data with well-known bioinformatics tools (bwa, Picard, GATK, MuTect, and Somatic Indel Detector), as other studies described and processed WTS data with fusion detection tools [
8‐
10]. We then identified candidate genes and suggested potential drugs that are specific to the genetic alterations of each patient. We also compared candidate genes for AYA cancers with the same types of cancers from all age groups using published data.
Methods
Ethics and consent statement
This study was approved by the Institutional Review Board (IRB) of Seoul National University Hospital (1206-086-414). We obtained written informed consent from the patients who participated to this study. All participants in this study gave us written informed consent for publication of their details. Written informed consent for publication of their clinical details and/or clinical images was obtained from the patients. A copy of the consent form is available for review by the Editor of this journal.
Samples from seven different tumors, prostate cancer, olfactory neuroblastoma, head and neck squamous cell carcinoma (HNSCC), urachal carcinoma, germ cell tumor, lung cancer, and liposarcoma, were prospectively obtained in three different forms (fresh-frozen tissue, formalin-fixed paraffin-embedded (FFPE), and pleurisy). The samples were analyzed using three different genomics platforms (whole-exome sequencing (WES), whole transcriptome sequencing (WTS), and OncoScan™) as the tumor sources permitted. We first intended to analyze samples from ten patients, but three samples were excluded because the amount of provided tumor sample was insufficient (AYA03) or sufficient DNA/RNA for a genome-scale analysis was not obtained (AYA05, and 08). For sample AYA04 (HNSCC), the HPV infection status was identified by IHC staining (data not shown).
Whole exome sequencing (WES)
A minimum of 3 μg of genomic DNA was randomly fragmented by Covaris, and the sizes of the library fragments were mainly distributed between 250 and 300 bp. adapters were then ligated to both ends of the fragments. Extracted DNA was amplified by ligation-mediated PCR (LM-PCR) and then purified and hybridized to the SureSelect XT Human All Exon v4 + UTR 71 Mb (Agilent Technologies, Santa Clara, CA, USA) for enrichment according to the manufacturer’s recommended protocol. After loading each captured library on the Hiseq2000 platform (Illumina, San Diego, CA, USA), we performed high-throughput sequencing for each captured library. Raw image files were processed by Illumina CASAVA v1.8.2 for base-calling with default parameters, and the sequences from each individual were generated as 101-bp pair-end reads.
Processing WES data to analyze SNVs and indels
WES data were processed using a series of steps. We aligned the sequenced files (Fastq file) to the reference genome (human reference genome g1k v37) using the Burrows-Wheeler Aligner (BWA v0.7.5a) [
11] and then sorted the output and removed PCR duplicates using PICARD v1.95 [
12]. Using the typical GATK workflow (The Genome Analysis Toolkit v2.6-5), we processed the data for local indel realignment and base quality recalibration [
13]. For variant calling, we used MuTect v1.1.6 for single nucleotide variants (SNVs) and Somatic Indel Detector (from GATK v2.2-8) for indels [
14]. Whereas we called the SNVs with the default setting value, we altered the tumor indel fraction from 0.3 to 0.05 (T_INDEL_
F <0.05) for indel calling after considering false-negatives. The called variants interpreted as somatic mutations were tagged with “KEEP” or “SOMATIC” with MuTect and Somatic Indel Detector, respectively, and used for further study. To avoid false-positive indel variants, we filtered out variants with tumor alterative reads less than 6. All somatic variants were annotated by ANNOVAR [
15]. The variants that passed through the steps were called ‘processed WES data’ (Additional file
1: Figure S1).
Analysis of copy number variations (CNVs) by OncoScan™
We used the 330-k OncoScan™ FFPE platform (Affymetrix, Santa Clara, CA, USA) to identify candidate CNVs (amplification/deletion and loss-of-heterozygosity (LOH)). AYA02 was excluded because the amount of DNA was insufficient for OncoScan™. A minimum of 80 ng of DNA from each sample was used for the OncoScan™ platform. The Nexus Express (Affymetrix) software was used to analyze the data and find CNVs. We filtered out CNVs with a CN ≤ 2.5, which were considered insignificant amplification, and analyzed chromosome-level CNVs and focal level CNVs.
Analysis of CNVs by VarScan2 for AYA02
To analyze CNVs in AYA02, we used VarScan2 as an alternative method to OncoScan™. After processing data up to the ‘Realignment/Recalibration’ step in WES processing (Additional file
1: Figure S1), we processed data based on the manufacturer’s recommendations. We generated mpileup data from recalibrated BAM files of both tumor and normal using SAMtools and used the ‘Copy caller’ module of VarScan2 to generate the relative copy number change (C), which was determined as follows:
C = log
2
((D
T
/D
N
)*(I
N
/I
T
)), where ‘D’ stands for the average depth, ‘I’ for the number of uniquely mapped bases, ‘N’ for normal, and ‘T’ for tumor [
16,
17]. After filtering out mapping quality values <15, we adjusted the relative copy number using the re-centering option in ‘Copy caller’ and segmented copy number regions based on the circular binary segmentation algorithm. After merging, the results were represented by IGV (Integrative Genomics Viewer) [
18]. Because the VarScan2 results covered only exon regions, not all genome regions, we analyzed only chromosome-level CNVs that were similar to those obtained with OncoScan™ and did not analyze focal-level CNVs. To create graphs of relative copy number changes, we used data from re-centered relative copy number changes displayed on a log
2 scale.
Analysis of mutation frequency and mutation spectrum
The mutation frequency was analyzed by counting the number of variants annotated by ANNOVAR from WES data as nonsynonymous SNVs, synonymous SNVs, nonsense mutations, stop-loss mutations, splicing mutations, frameshift insertions/deletions (indels), in-frame indels, and noncoding RNA in exonic regions. These mutations had previously been described in published data from 12 major cancer studies [
19]. To analyze the mutation spectrum, we used SNVs processed with MuTect in all sequenced regions not limited to coding regions.
Pathway-drug analysis
After assigning the levels to the variants by pattern-based heuristic annotation, we investigated the biologic pathways of variants using DAVID or the literature [
20]. To analyze the druggability of the variants, we concentrated mainly on level-1 (strong) variants using DGIdb [
21].
Fusion analysis
We analyzed WTS data from four samples (AYA01, 02, 09, and 10) to identify cancer driving fusions using three different fusion tools, FusionMap, deFuse and ChimeraScan [
22‐
24]. From the results, we selected candidate cancer-driving fusions using a fusion gene list archived in COSMIC (download date: 2015-03-03).
Acknowledgements
This study was supported by grant 03-2014-0290 from the Seoul National University Hospital Research Fund. This research was also supported by the MSIP (The Ministry of Science, ICT and Future Planning), Korea and Microsoft Research, under the ICT/SW Creative research program supervised by the NIPA (National ICT Industry Promotion Agency)“ (NIPA-2014-ITAH051014011012)”. We thank Ji-Eun Yoon and Su Jung Huh for collecting the tumor tissue samples and matched normal blood as well as for extracting genetic materials. Additionally, we thank Jiae Koh for revising the draft.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (
http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (
http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
Competing interests
The authors declare that they have no competing interests.
Authors’ contributions
SC, SHL, JIK designed the study. SC, SHL drafted the manuscript. SC, JL, JYS performed experiment and data analyses. JYK, SHS, BK, TMK, DWK, DSH participated in critical review of study design and data analyses, SC, JL, JYS, JYK, SHS, BK, TMK, DWK, DSH reviewed the manuscript and criticized it. All authors read and approved the final manuscript.