AMP PD Cloud Architecture

All AMP PD data is stored on Google Cloud Platform (GCP) and can be accessed by approved researchers using both AP PD Researcher Workbench and GCP-native tools.

Data is Available as files in Google Cloud Storage and as tables in Google BigQuery.

Google Cloud Storage

Google Cloud Storage is a general purpose object store. One more commonly considers it the place to store "data files" in a manner similar to a file system. Cloud Storage provides the ability to create "buckets" in order to store files (objects). Objects can be stored, listed, and accessed via directory format. 

Access to Cloud Storage is through a standard command-line tool, a web interface, and/or an authenticated REST API with client libraries for many programming languages. 

Data files such as FASTQ, CRAM, and VCF files will be stored in Google Cloud Storage:
 

Data Type Location Example Tables
Clinical Data gs://amp-pd-data/releases/2019_v1beta_040 Demographics
Transcriptomics gs://amp-pd- transcriptomics/releases/2019_v1beta_0220 aggregated.genes.tsv
aggregated.transcripts.tsv
aggregated.featureCounts.tsv
Genomics gs://amp-pd-genomics/releases/2019_v1beta_0220 VCF (join genotyped) Plink

Google BigQuery

Google BigQuery helps organizations structure genomic variant data to help teams achieve operational and scientific excellence through cloud computing.

Google Genomics helps the life science community organize the world’s genomic information and make it accessible and useful. Big genomic data is here today, with petabytes rapidly growing toward exabytes. With Google BigQuery, you can run SQL style queries on billions of rows and get results back in seconds.

Through our extensions to Google Cloud Platform, you can apply the same technologies that power Google Search and Maps to securely store, process, explore, and share large, complex datasets.

All AMP PD data that tabular in nature is available for SQL access in Google BigQuery. Each of the major data types is stored in a separate Google BigQuery dataset:
 

Data Type Location Example Tables
Clinical Data amp-pd-research:2019_v1beta Demographics
Transcriptomics amp-pd-research:2019_v1beta_transcriptomics feature_counts
quantification_genes
quantification_transcripts
Genomics amp-pd-research:2019_v1beta_genomics passing_variants
wgs_metrics
wgs_samples

Billing on Google

Google BigQuery helps organizations structure genomic variant data to help teams achieve operational and scientific excellence through cloud computing.

Google Genomics helps the life science community organize the world’s genomic information and make it accessible and useful. Big genomic data is here today, with petabytes rapidly growing toward exabytes. With Google BigQuery, you can run SQL style queries on billions of rows and get results back in seconds.

Through our extensions to Google Cloud Platform, you can apply the same technologies that power Google Search and Maps to securely store, process, explore, and share large, complex datasets.

Learn More: How to Set up Billing Projects & Google Billing Accounts