Emory Report
January 20, 2009
Volume 61, Number 16



Emory Report homepage  

January 20
, 2009
Genome’s ‘dark matter’

By Carol Clark

James Taylor’s office in the Rollins Research Center is clean and minimalist, with no papers cluttering his desk or shelves. “My work is almost completely computerized, and computers are really a general-purpose instrument,” says Taylor, an assistant professor whose work spans two departments: biology and math and computer science.

Genome Technology magazine recently named Taylor a top young investigator, featuring him in a special edition of 30 rising stars in genomics research.

“The information needed to build a complex organism like a human being is largely encoded in the genome. My lab is interested in trying to understand how that information is encoded,” Taylor says. “In a sense, we are trying to reverse engineer the genomic basis of the developmental program for a living organism.”

Taylor began his career as a software engineer, working for two firms that developed computer solutions for businesses. “I enjoyed finding ways to solve a problem,” he says, “but once the problem was solved, the projects moved into maintenance mode and things got boring.”

By the time the first rough draft of the human genome was mapped in 2000, Taylor knew that he wanted to return to school for a Ph.D. and shift his focus to science. “The sequencing of the genome revealed all sorts of problems that could only be solved with computational skills,” he says. “It opened up a path to take something that I was good at and use it in ways that were interesting and fulfilling.”

Scientists have recently theorized that only a tiny percentage of the genome is involved in coding for proteins — the activity associated with genes. The role of the remainder of the genome remains mysterious and has been likened to the “dark matter” of the cosmos.

Taylor is focused on exploring this dark matter. As he points out, the genes of a human being and a fruit fly are not that drastically different. “What’s really interesting is how and where and when those genes are expressed to create dramatically different organisms, and how the genes are encoded in the 95 percent of the genome that isn’t genes. That’s the grand challenge,” he says.

The genetic code is written in letters, with each letter standing for a molecule called a base. Strings of data, such as sequences of letters, are an intuitive way for humans to think about data.

But the information within most of the genome is not like text. “The different ways that it’s encoded may not be so easy for the human mind to understand, the signals may be too subtle,” Taylor theorizes. “Every way that information could possibly be encoded may be used in some way in the genome, because the chemistry and the biology is so random and evolution is opportunistic.”

In addition to his genomic research, Taylor’s lab is addressing the need to make high-throughput data analysis reproducible and easily shared among experimental biologists. In collaboration with Anton Nekrutenko at Penn State, Taylor developed Galaxy (http://galaxy.psu.edu) — an open-source software system that allows anyone with a normal laptop to analyze genomic data. Thousands of analyses are performed on the Galaxy Web site daily, and the application can also be downloaded free and installed in labs that have modest informatics support. The system is designed to handle multiple datasets and collaborative workflows. It automatically tracks and logs every step used in an analysis.

“Galaxy provides an infrastructure for analytical methods that are accessible, understandable and reusable,” says Taylor, who is continuing to expand and refine Galaxy in his role as principal programmer. “As computer tools become more sophisticated, it’s critical to provide every detail of how an analysis is done, in ways that are verifiable. If you can’t reproduce the results, you can’t really trust them.”

Taylor joined Emory last fall, and is currently recruiting both graduate and undergraduate students interested in the dual research areas of his lab. “We are just on the cusp of introducing more computer technology into science and more science into computers,” he says. “We are now developing students who more fully understand both biology and computers and statistics. They are going to drive the next set of data.”