Mental and behavioral disorders are the leading cause of disability worldwide1. Yet, progress in treating these disorders has largely stalled—there have been few new molecular targets identified since the discoveries of lithium, antidepressants and antipsychotic drugs in the 1950s and 1960s, and the mortality and morbidity of mental disorders have not declined since 1990 (ref. 2). Psychiatry needs a major revamp, and, as discussed widely at the annual meeting of the American College of Neuropsychopharmacology in December 2017, right now we are at an exciting time when comprehensive analysis of genotype, phenotype and environmental information combined with advanced computational tools could transform mental health research.

The current stagnation reflects poor understanding of the pathophysiology of these illnesses. For a long time, psychiatry has lacked quantifiable, objective phenotypes that are deeply rooted in biology. In oncology, tumors are imaged and sequenced; in cardiology, the structure and function of the heart are examined. Such objective measurements guide diagnosis and treatment in virtually all specialties of medicine, but do so to a much lesser extent in psychiatry, which is built upon self-reported symptoms. Symptoms are often subjective in nature and are not typically mapped directly onto altered biological processes in the brain. To advance mental health research, we need to focus on biology-based measurements that span from genes and molecules to cells, brain circuits and, ultimately, observable behaviors. This is the principle behind the US National Institute of Mental Health's Research Domain Criteria (RDoC) initiative, launched in 2010, which aims to establish a biologically valid framework for understanding mental disorders.

In recent years, technological advances have made it increasingly faster and more cost-effective to collect a variety of molecular (e.g., genome, transcriptome, proteome, metabolome, microbiome, etc.) and imaging (e.g., MRI, PET, etc.) data. A huge body of high-quality data is emerging from large-scale collaborative efforts, such as the Psychiatric Genomics Consortium, the Human Connectome Project, the IMAGEN project and the ABCD study, among others. In the clinic, many healthcare providers have adopted electronic medical records, which contain data on a massive number of patients. At the same time, smartphones and wearable devices are becoming ubiquitous, producing continuous data on one's physiology and surroundings. All together, these technologies present an unprecedented opportunity for comprehensive knowledge about the biological, behavioral, social and environmental factors that contribute to health and disease. The amount and complexity of the data generated by these technologies pose major technical challenges for analytics, but computational tools are being optimized to integrate and interpret the massive amounts of information.

Although still in its early days, big data has the potential to address a few key challenges currently facing psychiatry. Disease heterogeneity is one. In psychiatry, disease entities are fuzzy: symptoms can both vary substantially within the same diagnosis and overlap considerably between different diagnoses. For example, according to the fifth edition of the Diagnostic and Statistical Manual of Mental Disorders (DSM-V), a diagnosis for major depressive disorder can be made when a patient has at least five of nine major symptoms. This allows for hundreds of possible combinations of symptoms, all under the same diagnosis. Heterogeneity is a major roadblock for research and clinical trials, and the hope is that big data can help stratify patients into more biologically valid, homogenous subgroups. As an example, in a study published in our pages recently, the authors found that patients with depression can be subdivided into four biotypes characterized by distinct brain connectivity patterns; these biotypes were associated with different clinical symptom profiles and responded differently to transcranial magnetic stimulation therapy3.

Big data could also help improve precision in clinical practice. Currently, finding the right medication occurs through trial and error. As most treatments only work on a subset of patients, and considering that many have substantial side effects, it is imperative to adopt a precision medicine approach. For example, a group recently mined patient data from a large clinical trial and trained a machine learning algorithm to predict treatment response to an antidepressant, citalopram4. The sensitivity and specificity of the algorithm need further improvement; nevertheless, this points to the tantalizing possibility that big data combined with machine learning may help tailor treatment in individual patients, bringing much needed precision into mental healthcare. Such an approach may also help deliver healthcare further upstream, before a patient enters the clinic. Companies, such as Verily and Mindstrong, are looking into whether people with mental health conditions exhibit 'digital phenotypes'—aspects of physical activity, social dynamics, voice or patterns of human–computer interaction that can be captured by smartphones and, if so, whether mobile interventions (e.g., by telephone psychotherapy, text messages, etc.) could help.

There are major challenges ahead. The tension between data sharing and patient privacy is one. Another challenge is that large datasets inevitably turn up spurious correlations, so any results must be carefully interpreted and rigorously validated.

The success of big data projects requires seamless collaboration among researchers, data scientists, clinicians, engineers, patients and others. Traditional disciplinary barriers need to be broken down. Only then will we be truly close to our shared goal: a better understanding and improved treatment of the mental illnesses that afflict millions.