Clinical Assessment Data

Testing to see where this sentence is placed on the page

AMP PD harmonizes, or standardizes, similar data collected across BioFINDHBS, LBD, LCCPDBP, PPMI and STEADY-PD3. This data curation and transformation process facilitates and simplifies cross-cohort analysis. More specifically, variable names from AMP PD studies are aligned to a global mapping file and final curation is reviewed by AMP PD; this Harmonized Dictionary, based on CDISC terminology, is available for further background. Harmonized cohort data is made available in AMP PD through Big Query.

Data Curation Workflow

Data from four different Parkinson’s Disease studies were harmonized to the same standard, curated and consolidated into one dataset using automated and manual approaches. To harmonize and standardize metadata for AMP PD project a global mapping file (Harmonized Dictionary) aligning variables between datasets was first created. CDISC terminology was used for harmonized variable names and descriptions when possible. A coding file was then created to decode numeric coded variables, clean-up and standardize medication names, diagnosis, level of education, etc., and align visit names between cohorts. After mapping and coding files were generated, an automated tool was applied to transform data files and perform integration of four datasets into one set of curated files. Manual inspection of transformed files followed each phase of automatic transformation. The content of each transformed file was approved by a curator and all needed adjustments were performed manually. Finally mapping files (dictionaries) for uploading data into BigQuery tables were produced by processing the content of the curated dataset using additional R-script.

Curation workflow represents three main steps

1. Data acquisition and review
2. Data harmonization
3. Data transformation/curation and QC

Data Acquisition and Review

Based on the priority assigned by the AMP PD Clinical Data Harmonization (CDH) group, the data was split into two batches: Subset 1 & Subset 2. Considerations and approach for determining clinical

data scope and analysis:

  • Key variables critical for interpreting biological data (e.g. demographics)
  • Variables to increase ease of use of biological data (e.g. genotype)
  • Relevance and importance to Parkinson's disease
  • Data complementary to biologic data generated through AMP PD
  • Identified as the highest priority based on collective input from research experts in the PD community


Data Harmonization

Harmonization cycle icon

Metadata variables were harmonized based on the data compatibility upon Clinical Data Harmonization (CDH) group suggestions, decisions, and final approval. CDISC terminology was used if available for Title and Description. Values of harmonized variables from different studies were standardized and included in the coding file. The coding file contains decodes for numeric coded variables, clean-up and standardize medication names, diagnosis, level of education, etc., and aligns visit names between cohorts.

Data Transformation/Curation and QC

Both automated (custom SmartConverter tool) and manual approaches were used to perform data transformations. The original data files were inspected for extended ascii characters, number of patients, visit types, codes and their decodes availability in supporting study documents. Transformation templates and coding file were prepared based on a harmonized dictionary and curation decisions to perform three rounds of transformation/consolidation using SmartConverter. After each round output files were inspected, and additional manual transformations were performed before the next round of automated transformation and after the final curation. Subset 1 and subset 2 were curated separately using the same approach described below:

Step 1: Transform Raw Data

  1. Prepare vocabularies and add to primary code file
  2. Organize data-files by study
  3. Create coding file and transformation template
  4. Run SmartConverter Round 1 and perform QC

Step 2: Transform & Consolidate

  1. Organize curated files into distinct study folders
  2. Modify transformation template
  3. Consolidate subset 1 and subset 2 categories
  4. Run SmartConverter Round 2 and perform QC

Step 3: Transform & Finalize

  1. Add and consolidate clinical data (e.g. missing diagnosis inputs)
  2. Remove and substitute fields
  3. Run SmartConverter Round 3 and perform QC

Clinical Data Validation Plan

The AMP PD Clinical Data Harmonization (CDH) team crafted a plan to further validate the results of the harmonization process. The purpose of the validation plan was to: 

  1. Ensure no new errors were introduced into the clinical data as a result of the data harmonization process
  2. Facilitate identification of records that should be excluded from the public release
  3. Identify a set of tests that can be run to validate additional data submission from the current AMP PD cohorts as well as future data submissions from new cohorts

clinical data validation tests_primary and secondaryThe CDH constructed: 42 individual cohort tests, identified 23 unique tests to run against harmonized data from all four cohorts, and identified 19 tests that were not valid against harmonized data because of excluded or modified data points, or changes to data structures.

The following key decisions and outputs were made as a result of executing the validation plan: 

  1. Alignment of SmartConverter data outputs against program and cohort specific tests

  2. Final inclusion/exclusion release criteria for clinical data

  3. Secondary dataset(s) for further analysis and curation for potential future release

  4. Confirmed AMP PD Subject Master List

  5. Final AMP PD clinical dataset for public release


Cohort & Across Cohort Business Rules

AMP PD received cohort specific business rules from BioFIND, HBS, PDBP, and PPMI. These rules were applied by the cohorts to the raw data inputs prior to the clinical data harmonization process. As part of the QC process, these business rules were re-checked after the harmonization process to ensure the rules were still valid. 

Business Rule Check


Discordant Sex Check Reported sex should be same across multiple visits and studies. HBS Cohort specific business rule.
REM Sleep Behavior Disorder Questionnaire Check Check RBD checklist score does not exceed 13. HBS Cohort specific business rule.
UPDRS Total Score Check Check total score does not exceed 199. HBS Cohort specific business rule.
UPDRS Sub-scale Score Check Check UPDRS subscale scores do not exceed the following: Section I: 16 points; Section II: 52; Section III: 108; and Section IV: 23. HBS Cohort specific business rule.
MMSE Outlier Check Check UPDRS subscale scores do not exceed the following: Section I: 16 points; Section II: 52; Section III: 108; and Section IV: 23. HBS Cohort specific business rule.

HBS Cohort Specific Data Checks


LBD Cohort Specific Data Checks

  • Age consistency:    Check consistency of age, adjusting for time, across multiple visits and studies
  • Ethnicity consistency:    Ethnicity should be same across multiple visits and studies
  • Race consistency:    Race should be same across multiple visits and studies
  • Sex consistency check:    Gender should be same for the same GUID across PDBP cohorts
  • Missing form check:    The required clinical assessment not filled or not submitted

PDBP Cohort Specific Data Checks also applied to STEADY-PD3


PPMI Cohort Specific Data Checks also applied to BioFIND and LCC


Harmonized Assessment & Variable Matrix

The following variables are harmonized across a breadth of standard assessments from two or more AMP PD cohorts. Click a variable to view additional details such as its definition, values, schema, and curation notes. If you want to download a version of the full AMP PD Data Dictionary, click one of the buttons below for a specific format. 


Harmonized Variables

























































Medical History

















Environment Risk Factors

















Clinical Assessments    






















H&Y (see MDS-UPDRS Part III)  
















































Biospecimen Analyses Sphingolipids (plasma & CSF)