News & Updates

AMP PD Release Notes - October 2019

Data Summary

 Data Composition

Clinical Data
Participant records were compiled from BioFIND, HBS, PDBP, and PPMI cohorts into a harmonized dataset. These records were then paired with RNA and WGS samples and excluded if matching sample data was not October 2019 AMP PD data diagramavailable, with the exception of 9 participants whose WGS samples were excluded solely for duplicating samples in AMP PD cohorts.

RNA Data
RNA sample data was sequenced and processed for BioFIND, PDBP, and PPMI cohort participants. RNA samples were excluded during QC rounds when there was no corresponding clinical data.

WGS Data
DNA sample data was sequenced and processed for BioFIND, HBS, PDBP, and PPMI cohort participants. WGS samples were not excluded during QC rounds when there was no corresponding clinical data or when available clinical records require further investigation to warrant sample exclusion. In the 2019_v1release_1015, all HBS samples were excluded from the joint genotyping run to address missing data sharing consents in one or more subjects, to be included in the next joint genotyping run once HBS participant consents are fully reconciled.

Integrated Data
This release includes 2877 subjects with fully integrated clinical records, WGS samples, and RNA samples. For an additional 348 participants, this release includes RNA samples with corresponding clinical records when WGS is not yet available. There are similarly 1064 WGS samples with clinical records where RNA sample data is not yet available. There are no cases where only RNA and WGS data intersect because RNA QC required that clinical records exist.
 

Composition by Cohort

BioFIND Data
Of 213 participants whose clinical records met AMP PD minimum clinical data criteria, 167 participants have corresponding samples in all three release data categories.

HBS Data
Of 876 participants whose clinical records met AMP PD minimum clinical data criteria, all 876 have corresponding WGS sample data. HBS is not represented in the joint genotyping data due to reconciliation of participant consents. HBS samples are not fully integrated as no RNA sequence data was processed for this release.
    
PDBP Data
Of 1599 participants whose clinical records met AMP PD minimum clinical data criteria, 1315 participants have corresponding samples in all three release data categories, 1469 have corresponding WGS samples, and 1445 have corresponding RNA samples.

PPMI Data
Of 1610 participants whose clinical records met AMP PD minimum clinical data criteria, 1395 participants have corresponding samples in all three release data categories, 1433 have corresponding WGS samples, and 1572 have corresponding RNA samples.

Google Cloud Storage

Participant Data Products

  • Table of all participants in all release data (n=4298)
  • Table of all participants with minimum diagnosis information
  • Table of all participant whole genome sequence samples in release data (n=3941)
  • Table of all participant (n=3225) transcriptomics samples (n=8356) in release data
  • Table of all participants who were included in more than one study and, therefore, appear twice in clinical and transcriptomics data (n=24).  For these genetically identical samples, the WGS sample with lower mean coverage was removed.
  • Harmonized clinical data
    • harmonized clinical data for 27 clinical forms as csv
    • harmonized clinical per-form dictionary files as csv

 

WGS Data Products

  • Table of all participant samples (n=3941) and processed file locations
  • Single sample processed data: CRAM, gVCF, and GATK processing metrics (n=3941)
  • Joint genotyping processed data: annotated variant vcf data (n=3074)
  • Plink files: aggregated plink bfiles from all processed vcf data (n=3074)

 

RNA Data Products

  • Table of all participant (n=3225) samples (n=8356) and processed file locations
  • Processed RNA sample data
    • picard metrics: Aggregated per-sample alignment summary metrics, insert size metrics, and rna seq metrics. (n=8356)
    • salmon quantification: Aggregated per-sample quantification estimates of the expression of transcripts and genes. Also available in matrix form. (n=8356)
    • star align-reads: Aggregated per-sample Log.final.out outputs. (n=8356)
    • feature counts: Aggregated per-sample featureCounts.tsv outputs. Also available in matrix form. (n=8356)
    • plink genomes: Pairwise comparison of participants' RNA and WGS samples to detect sample contaminations, swaps and relatedness. (n=8356)
    • multiqc reports: An html file containing visualizations from multiqc. Other multiqc artifacts are also available. (n=8356)
    • sequencing metrics: Metrics from the sequencing provider. (n=8356)

 

Google BigQuery

BigQuery Datasets

Participant Clinical Access BigQuery Dataset:

AMP PD Metadata Tables
amp_pd_participants
amp_pd_case_control
wgs_sample_inventory
rna_sample_inventory
duplicate_subjects

Clinical Participant Tables
Demographics, PD_Medical_History, Enrollment, Caffeine_history, Family_History_PD, Smoking_and_alcohol_history

Clinical Assessments Tables
Epworth_Sleepiness_Scale, MDS_UPDRS_Part_I,MDS_UPDRS_Part_II, MDS_UPDRS_Part_III, MDS_UPDRS_Part_IV, MMSE, MOCA, Modified_Schwab___England_ADL, PDQ_39, REM_Sleep_Behavior_Disorder_Questionnaire_Mayo, REM_Sleep_Behavior_Disorder_Questionnaire_Stiasny_Kolster, UPDRS, UPSIT

Clinical Bio Tables
Biospecimen_analyses_CSF_abeta_tau_ptau,Biospecimen_analyses_CSF_beta_glucocerebrosidase, Biospecimen_analyses_other, Biospecimen_analyses_SomaLogic_plasma, DaTSCAN_SBR, DaTSCAN_visual_interpretation, MRI, DTI
 

Participant Tier 2 BigQuery Dataset:

AMP PD Metadata Tables
amp_pd_participant_mutations

Clinical Participant Tables
Clinically_Reported_Genetic_Status
 

WGS BigQuery Dataset:

WGS Joint Genotyping Tables (n=3,074)
gatk_passing_variants, gatk_all_variants

WGS Joint Genotyping Metrics Tables (n=3,074)
gatk_variant_calling_detail_metrics

WGS Single Sample Variant Metrics Tables (n=3,941)
gatk_variant_calling_summary_metrics

WGS Single Sample Alignment Metrics Tables (n=3,941)
raw_wgs_metrics, wgs_metrics, preBqsr_selfSM

WGS Sample Metadata Tables (n=3,941)
wgs_samples
 

RNA BigQuery Dataset:  

RNA Sample Metadata Tables
rna_seq_samples

Picard  Tables
alignment_summary_metrics, insert_size_metrics, rna_seq_metrics

Salmon Tables
quantification_genes, quantification_transcripts

Star Tables
star_metrics

FeatureCounts Tables
feature_counts

Plink Tables
genome_check_HW_MAF

Sequencing Tables
rna_quality_metrics