Frequently Asked Questions (FAQs)

What is CardioGenomics eXchange commons (CardioGX)?

CardioGX is an interactive, easy-to-use web application that enables users to upload, explore, analyze variant call format (VCF) files generated through whole genome/whole exome sequencing. The platform has a unique discussion board to foster collaboration between users eventually and build a user community to help deal with some of the rare, complex genetic diseases of the heart. The application is designed to assist users with identification of gene-phenotype associations and help identify disease-causing variants in patients with congenital heart disease.

What file types can I upload?

CardioGX accepts VCF files (both annotated and unannotated) from exome or whole genome sequencing. VCFs must be mapped to hg19/GRCh37 genome build. Typically, a VCF file contains the following minimal columns: CHROM (Chromosome), POS (position), ID (dbSNP rs#), REF (reference allele), ALT (alternate allele), QUAL (quality), FILTER (filter call), INFO (information field; miscellaneous), FORMAT, SAMPLE

When a user uploads a VCF to CardioGX, the file will be passed through our automated annotation pipeline using ANNOVAR. VCFs that already contain annotation (annotated using programs like ANNOVAR, SNPEff, VEP, etc.) may still be uploaded to CardioGX, but beware they will be re-annotated through our ANNOVAR pipeline.

If you have questions or issues when trying to upload a VCF file, please contact us at cgx@nationwidechildrens.org.

Can I upload a multi-sample (e.g., trio analysis) VCF file?

No. Currently our application accepts only single sample VCF files. You can refer to the following forum for help on how to extract a single sample of interest from a multi-sample VCF: https://gatkforums.broadinstitute.org/gatk/discussion/54/selecting-variants-of-interest-from-a-callset

What reference assembly does CardioGX use for genomic coordinates?

hg19/GRCh37

What data sources are used in CardioGX?

When a user uploads a VCF file, the file is annotated using ANNOVAR tool version 2017Jun01 (http://annovar.openbioinformatics.org/en/latest/). ANNOVAR supplies annotation information related to variants that will be displayed on the web app (e.g., variant type, gene name, chromosomal coordinates, amino acid change, molecular consequence, location in gene, and protein predictions).

The following public databases have been integrated into the CardioGX application and will be searched for gene associations when a user enters a phenotype keyword.

Database URL Scope Data Entry
Online Mendelian Inheritance in man (OMIM) ncbi.nlm.nih.gov/omim Human disease and gene-oriented database Curators
ClinVar Variant ncbi.nlm.nih.gov/clinvar Human; Variant-centric Curators and users
ClinVar Gene ncbi.nlm.nih.gov/clinvar Human; Gene-centric Curators and users
Gene References into Functions (GeneRIFs) ncbi.nlm.nih.gov/gene/about-generif Human and Mouse functional annotation (snippets) of genes Largely users
Mouse Genome Index (MGI) informatics.jax.org Mouse gene/phenotype/ disease database- mouse research Largely curators

Users may enter phenotype keyword(s) related to the patient in question. When a user enters a phenotype keyword, this keyword is cross-referenced with all the above public databases, and a list of genes that have been reported to be associated with the phenotype keyword(s) is generated. This gene list is then cross-referenced with the input VCF file. If there are any variants present within any of the phenotype-associated genes, they will be displayed on the webpage.

Users may enter any phenotype keywords. Our search is designed to use exact words. For example, if a user enters “deafness,” our application searches all databases (OMIM, ClinVar, etc.) for exact match to “deafness” in any database entries. If there are any matches, a list of gene(s) associated with “deafness” will be generated, and any variants in the VCF within these gene(s) will then be displayed on the webpage. Users may search for multiple keywords by adding one at a time and clicking the search icon after each entered term. Multiple searches will be interpreted as ‘AND’ searches. For example, if a user searches deafness and blindness, only matches for “deafness” AND “blindness” will be performed.

Which columns can I filter?

From the “Annotation Filters” dropdown menu, users can filter the following columns:

  • Databases: Multi-select databases to search for gene association against specific databases or leave blank to search against all available databases. Two strategies are used to search against ClinVar, please see detailed description in the following section.
  • Type: Variant types include SNP, INS, DEL, and PNP. More information here.
  • Zygosity: Variant gene zygosity designation, i.e. homozygous (HOM) or heterozygous (HET).
  • Pathogenicity: Variant pathogenicity based on ClinVar. More information here.
  • Loc in Gene: Property of variant location in the gene. More information here.
  • Gene Symbol: Enter an HGNC gene symbol to restrict your search to variants within a specific gene. If multiple genes are entered, they will be treated as an ‘OR’ search
  • Molecular Consequence: Molecular Consequences determined by ClinVar. More information here.
  • Frequency: Enter a frequency cutoff, and the application will display all variants with a frequency less than the entered frequency (according to gnomAD). More information here.
  • In addition to filtering based on annotations, we have included two additional pre-defined filters:

  • Pharmacogenomics: Pharmacogenomic results refer to genetic variants reported to be associated with differential responses to pharmaceutical medications, which may result in variable rates of medication clearance or metabolism. By clicking this filter option, variants labeled as “drug response” in the ClinVar database will only be displayed.
  • Secondary Finding: The American College of Medical Genetics and Genomics (ACMG) recommend reporting back “secondary findings” from 59 genes—genes that may be unrelated to the primary medical reason for sequencing but are known to cause severe disease if mutated. By clicking this filter option, variants within any of the 59 genes AND labeled as “pathogenic” or “likely pathogenic” in the ClinVar database will only be displayed.
  • Are there predefined filters on clinical significance?

    While we do not provide predefined filters for clinical significance, users may consider the following sample filter combinations as a starting point. Variants returned by the more restrictive filter set are more likely to be of higher clinical significance. The users can also sort the variants by In Silico Protein Prediction to identify variants with at least 1 detrimental prediction as criteria for clinical significance.

    High clinical significance variants through restrictive filter:
    Pathogenicity: “Pathogenic”; Frequency < 0.01.

    Intermediate clinical significance variants through moderate filter:
    Pathogenicity: “Pathogenic”, “Likely Pathogenic”, “Risk Factor”; Frequency < 0.01; Filter clinical phenotypes through ClinVar, OMIM, GeneRIF Human databases.

    Low clinical significance variants through limited filter:
    Pathogenicity: “Pathogenic”, “Likely Pathogenic”, “Risk Factor”, “Uncertain Significance”, “Association”; Frequency < 0.05; No filter on the database (i.e. select all).

    What are the “ClinVar Gene” and “ClinVar Variant” options in the Database filter?

    ClinVar Variant: Search terms will match against keywords in ClinVar database, and variants reported to be responsible for those terms will be used to filter the VCF file. Since your VCF file may not have many of the variants reported in ClinVar, few variants tend to be identified with the ClinVar Variant database search.
    ClinVar Gene: Search terms will match against keywords in ClinVar database, and genes reported to be responsible for those terms will be used to filter the VCF file. Since genes are being used, there is a possibility that non-relevant variants in that gene also get matched in the results.

    Can I restrict my search to a specific gene?

    Yes. Click the “Annotation Filters” dropdown and type gene symbol in the “Gene Symbol” box. Use the official gene symbol from the HUGO Gene Nomenclature Committee (HGNC). If you believe a gene symbol in your VCF annotations is incorrect, please contact us at cgx@nationwidechildrens.org to let us know.

    Can I search my VCF for a specific variant, for example using chromosomal coordinates or amino acid change?

    No. Currently, a user can filter variants using the filters listed in the “Annotation Filters” dropdown menu. Other columns, however, can be sorted (alphabetically or numerically) by clicking on the arrows next to the column header.

    Why do some columns disappear when I search?

    When a user searches for phenotype keywords, a new column “Database Results” appears and displays database matches to the entered keyword(s). As a result, some other columns may collapse, but the hidden column information can be viewed by clicking on the “+” on the left side of gene name.

    What are the possible values in the “Type” column?

    We use ANNOVAR (version 2017Jun01) to generate annotations from your uploaded VCF. ANNOVAR documentation for gene-based annotation can be found here: http://annovar.openbioinformatics.org/en/latest/user-guide/gene/ We summarize below:

  • DEL: a deletion of one or more nucleotides; in the corresponding “Nucleotide” column, the reference nucleotide(s) is(are) displayed, followed by a hyphen (“ > -“) to indicate the deletion
  • INS: an insertion of one or more nucleotides; in the corresponding “Nucleotide” column, a hyphen indicates an insertion (“- >“) and is followed by the alternate nucleotide(s) inserted at this position
  • SNP: a single nucleotide polymorphic change (e.g., A > T)
  • PNP: a poly-nucleotide polymorphic change (e.g., GA> CC or ACA > GCT)
  • What are the possible values in the “Loc In Gene” column?

    We use ANNOVAR (version 2017Jun01) to generate annotations from your uploaded VCF. ANNOVAR documentation for gene-based annotation can be found here: http://annovar.openbioinformatics.org/en/latest/user-guide/gene/ We summarize below:

  • Exonic: variant within a protein-coding exon region
  • Splicing: variant within 2-bp of a splicing junction
  • ncRNA: variant within RNA without coding annotation
  • UTR5: variant within 5’ untranslated region
  • UTR3: variant within 3’ untranslated region
  • Intronic: variant within intron
  • Upstream: variant within 1-kb region upstream of a transcription start site
  • Downstream: variant within 1-kb region downstream of a transcription start site
  • Intergenic: variant within a region between two genes
  • According to ANNOVAR, if a variant fits multiple categories, the following precedence is used to decide which annotation to print out: exonic = splicing > ncRNA >> UTR5/UTR3 > intron > upstream/downstream > intergenic

    What are the possible values in the “Molecular Consequence” column?

    We use ANNOVAR (version 2017Jun01) to generate annotations from your uploaded VCF. ANNOVAR documentation for gene-based annotation can be found here: http://annovar.openbioinformatics.org/en/latest/user-guide/gene/ We summarize below:

  • Frameshift insertion: an insertion of one or more nucleotides that cause frameshift changes in protein-coding sequence
  • Frameshift deletion: a deletion of one or more nucleotides that cause frameshift changes in protein coding sequence
  • Frameshift block substitution: a block substitution of one or more nucleotides that cause frameshift changes in protein coding sequence
  • Stopgain: a nonsynonymous SNV, frameshift insertion/deletion, nonframeshift insertion/deletion or block substitution that lead to the immediate creation of stop codon at the variant site. NOTE: For frameshift mutations, the creation of stop codon downstream of the variant will not be counted as "stopgain"
  • Stoploss: a nonsynonymous SNV, frameshift insertion/deletion, nonframeshift insertion/deletion or block substitution that lead to the immediate elimination of stop codon at the variant site
  • Nonframeshift insertion: an insertion of 3 or multiples of 3 nucleotides that do not cause frameshift changes in protein coding sequence
  • Nonframeshift deletion: a deletion of 3 or mutliples of 3 nucleotides that do not cause frameshift changes in protein coding sequence
  • Nonframeshift block substitution: a block substitution of one or more nucleotides that do not cause frameshift changes in protein coding sequence
  • Nonsynonymous SNV: a single nucleotide change that cause an amino acid change
  • Synonymous SNV: a single nucleotide change that does not cause an amino acid change
  • Unknown: unknown function (due to various errors in the gene structure definition in the database file)
  • According to ANNOVAR, if a variant fits multiple categories, the following precedence is used to decide which annotation to print out: frameshift insertion > frameshift deletion > frameshift block substitution > stopgain > stoploss > nonframeshift insertion > nonframeshift deletion > nonframeshift block substitution > nonsynonymous SNV > synonymous SNV > unknown

    What is the “Pathogenicity” filter?

    The Pathogenicity filter utilizes the “Clinical Significance” label from ClinVar as outlined in https://www.ncbi.nlm.nih.gov/clinvar/docs/clinsig/, and include the following possible values:

  • Five tiers as recommended by ACMG (Pathogenic, Likely Pathogenic, Uncertain Significance, Likely Benign, Benign)
  • Association (variants identified in GWAS or similar studies)
  • Risk factor (variants that contribute to disease risk but are not causal)
  • Protective (variants that decrease risk of disease)
  • Which transcript is used for “Amino Acid Change” column?

    As a result of alternative splicing, many different possible transcripts can be produced from a single gene. The transcript ID displayed here is taken from the longest transcript for that gene. This is the most popular approach nowadays, as the longest transcript is often referred to as the “canonical” transcript.

    How is the “Frequency- gnomAD” column derived?

    The frequency displayed for each variant is derived using “total” frequency from all gnomAD genomes (i.e., aggregate frequency of all populations; N =15,496 genomes) http://gnomad.broadinstitute.org/faq

    A frequency of “0.0” indicates the variant was not observed in gnomAD genomes database

    What is the “In Silico Protein Prediction” column?

    We use ANNOVAR (version 2017Jun01) to generate annotations from your uploaded VCF. ANNOVAR documentation for filter-based annotation can be found here: http://annovar.openbioinformatics.org/en/latest/user-guide/filter/

    The in silico protein prediciton column gives information about how variants are computationally predicted to affect protein function. Six different prediction scores (from ANNOVAR) that identify cross-species conservation (GERP++, PhyloP, SiPhy) or variant effects (SIFT, PolyPhen2, FATHHM) on protein function are recorded in this column. The fraction of algorithms that predict the variant to have a damaging effect on the protein product are displayed (e.g., 5/6). Clicking on the “+” gives more detailed information about the algorithms and their corresponding assigned scores

    I think I see an error in my variant annotations. What should I do?

    Contact us at cgx@nationwidechildrens.org and specifically describe the variant, corresponding annotations, and any perceived errors. We encourage you to include screenshots when you contact us.

    Why doesn’t CardioGX include links to UCSC, dbSNP, VarSome, or other popular databases?

    CardioGX is a research project with ongoing development. If you have suggestions to improve CardioGX, please contact us at cgx@nationwidechildrens.org.

    Can I save specific searches?

    Yes. The URL can be copied and retains all search terms.

    Can I upload more than one VCF?

    Yes, you upload as many single sample VCFs as you want. You can toggle between VCF files using the dropdown menu in the top left corner of the search utility.

    How do I share a VCF with others?

    You may share a VCF with another individual by navigating to Dashboard > VCF files; then click on the arrow next to the file and enter your collaborator’s name and e-mail. If your collaborator is not a registered user of CardioGX, they will first be prompted to create an account before they can view the VCF.

    Furthermore, you can choose to make your VCF public and share with all registered CardioGX users. To do this, navigate to Dashboard > VCF files and click on the lock button to toggle between private (locked icon) vs. public (unlocked icon).

    What are the Terms & Conditions when using CardioGX?

    Please review the Terms & Conditions here.

    How is CardioGX funded?

    This project is made possible with funding support from The American Heart Association Institute for Precision Cardiovascular Medicine grant 17IG33630060 and by Nationwide Children’s Hospital strategic funding

    How should I cite CardioGX?

    CardioGenomics eXchange commons (CardioGX) (YYYY). http://CardioGX.nationwidechildrens.org [Accessed DD Month abbreviation. YYYY].

    For example: CardioGenomics eXchange commons (CardioGX) (2018). http://cardiogx.org [Accessed 12 Feb 2018].

    How can I contact CardioGX developers?

    Email: cgx@nationwidechildrens.org