Researchers at the University of Pittsburgh, UPMC and the Pittsburgh Supercomputing Center have teamed up to create a software resource to help investigators wade through a colossal amount of genomic cancer data in search of better methods of prevention, diagnosis and treatment. The open-source, freely available software, which processes data generated by The Cancer Genome Atlas (TCGA) project and is called TCGA Expedition, is described today in the journal PLOS ONE.
“Starting with TCGA, our goal is to make large data sets available to the average researcher who would not otherwise be able to access this information,” said lead author Rebecca Jacobson, M.D., M.S., professor of biomedical informatics at Pitt’s School of Medicine and chief information officer of Pitt’s Medicine. “There’s a growing understanding that further advances in health care are going to require a previously unseen level of data-sharing, which will require new tools. That’s particularly true in cancer research, as recognized by the major focus on data-sharing in Vice President Joseph Biden’s recently announced Cancer Moonshot initiative.”
Funding for the new software was provided by IPM and the University of Pittsburgh Cancer Institute (UPCI), a partner with UPMC CancerCenter.
“This work is about enabling and speeding up science,” said Adrian Lee, Ph.D., director of IPM and of UPCI’s Women’s Cancer Research Center, and a co-author on the new paper. “Resources such as this will be key in our move to precision cancer genomic medicine.”
Fundamentally, all cancers are caused by an overgrowth of cells due to an error in DNA. Examining a cancer’s complete set of DNA, or genome, can provide insights into many aspects of tumor biology. The goal of TCGA, a collaborative effort of the National Cancer Institute and the National Human Genome Research Institute, is to collect and share genomic data from cancers with poor prognoses and the greatest impacts on public health. To date, the project has profiled 33 different cancers from more than 11,000 patients, and the resulting data has been used in more than 1,000 cancer studies.
“These very large data sets are incredibly hard to work with because they are enormous, not only in terms of the amount of digital storage space they need, but also in terms of the complexity of software and computational processing power that they require,” Dr. Jacobson said. “Right now, our institutions are choking on data.”
The new software continuously downloads, processes and manages the TCGA data, allowing researchers to take the tools that they need and apply them to making cancer discoveries.
The team then put the new software to work, creating an information technology framework called the Pittsburgh Genome Resource Repository to allow approved Pitt researchers to use the TCGA data much more effectively. While initially designed for TCGA data, the new software can also be used with other large data sets, and is already a key part of several other big data projects PGRR supports, such as the National Institutes of Health’s Big Data to Knowledge initiative and Pennsylvania’s Commonwealth Universal Research Enhancement program.
“One of the unique things about the University of Pittsburgh is that it’s had such a distinguished history of being part of major data-sharing initiatives, such as the Shared Pathology Informatics Network, the National Patient-Centered Clinical Research Network, ACT Network and the TIES Cancer Research Network,” Dr. Jacobson said. “Our new software is a continuation of that legacy.”
The hope is that the benefits of TCGA Expedition will extend well beyond Pittsburgh.
“The fact that we made our software open source and freely available demonstrates our commitment to taking the advances in using big data sets and data-sharing that we make here and helping other institutions make their own advances,” Dr. Jacobson said.
Additional collaborators on the project included Uma Chandran, Ph.D., M.S.I.S., Olga Medvedeva, M.S., M. Michael Barmada, Ph.D., Anish Chakka, M.S., Soumya Luthra, M.S., Antonio Ferreira, Ph.D., Kim Wong, Ph.D., Jeremy Berg, Ph.D., and Annerose Berndt, Ph.D., D.V.M., all of Pitt; Philip Blood, Ph.D., Zhihui Zhang, Ph.D., Robert Budden, B.S., and J. Ray Scott, B.A., of Carnegie Mellon University in Pittsburgh.