Clinical Informatics
Predictors and Mechanisms of Conversion to Psychosis
Description
Many types of data based upon medical assays include N patients (with case/ctrl status, age of onset, survival rates, etc.) and p predictors (blood proteins, leukocytic microRNAs, clinical interview scores, genomic reads, etc.). However, conventional analyses generally fail to provide causal insight that would enable rational diagnoses or treatments. For example, genomic analyses often select ~100 predictors from many thousands using individual p-values, a hopelessly simplistic approach. RENCI has developed a completely different algorithm to identify collectively informative predictors—somewhat like choosing track team members with diverse abilities, not just the very fastest sprinters.
In our experiments with real data with N < p, N << p, and N > p, the new method appears to have superior predictive power vs. LASSO, neural nets, random forests, etc. The number of selected predictors is smaller with simpler weights and superior metrics (p-value, AUC, correlation, mean squared error). The output promotes translation of selected predictors into causality of disease, sometimes enabling backtracking to distinguish a potential trigger. Starting in 2015, four publications have applied the method to different data types in schizophrenia research. A broad paper with diverse examples, mathematical underpinnings, and open-source R and Python programs was published in Scientific Reports in 2022.
Causal network theory (e.g., Turing Prize-winning works of J Pearl et al.) is currently plagued by its prohibition of feedback loops. RENCI’s work on correlation networks (sets of blood analytes or brain regions (fMRI data) that gain or lose correlation in disease is again completely different. Selecting a threshold that approves only the strongest correlation changes identifies pairs of predictors that are unexpectedly altered. (We discard, for example, some pairs such as a pro- and anti-coagulation pair of proteins which would be expected and are observed to be almost always highly correlated.) Again, this work enables backtracking to stronger or weaker effects of common triggers, thus suggesting therapies. A paper on correlation analysis will soon be submitted to a scientific journal.
All of the above work descends from diverse psychiatric and neurological data analyses. However, far broader applications of the algorithm outside medical research are anticipated.
RENCI's Role
This carryover from a large NIMH project is led by Clark Jeffries (RENCI) and Diana Perkins (UNC Psychiatry). Models use fMRI, blood proteins, or other signals for various prognoses. All data analysts are familiar with measurements of model performance such as Pearson correlation, Welch p-value, and AUC, but many are not aware of math pitfalls; the pitfalls are not found in standard references and other papers. RENCI has developed workarounds that determine whether modeling conclusions are or are not reasonably attributable to chance. Additionally, RENCI researchers Chris Bizon, Jeff Tilson, Darius Bost, and Dayne Filer contributed to the research and paper preparation.