Spotlight on Bioinformatics

Biology goes digital

It's important to enjoy your job and be motivated: the best ideas come whilst I'm in bed, or walking my dog, or having a coffee with colleagues. Federico Abascal

A new species of biologist is beginning to thrive in the niche created by recent genomic and computational advances.

THERE ARE two paths to careers in bioinformatics, both of which require learning a new language. Computer scientists must become fluent in the life science terminology of genetics, genomics and cellular biology. Biologists must pick up skills in data analysis, including statistics, logic and programming. When the field was developing, fledgling bioinformaticians often taught themselves. Now, more institutions are offering formal training, and the field is maturing rapidly.

The skill set needed by a bioinformatician continues to evolve. In the early days of the human genome project, it was sufficient for scientists to find homologous genes of one organism in the genome of another. Now, bioinformaticians routinely compare multiple genomes, analyse regions that don't code for DNA, and incorporate a host of proteomic information in their analysis. Both the type and amount of information continues to expand, as biological techniques continue to improve.

As a result, the proficiency bar in bioinformatics continues to rise, along with the demand for talented bioinformaticians (see new mobility: a case study). A few decades ago the ability to scour databases to find a single gene provided at least a plank in the platform for a career in bioinformatics. Now, that skill is a basic part of a molecular biologist's toolkit, as essential as fundamental wet lab techniques. In response, bioinformaticians need to keep improving their skill set. And to really make a mark, they need to develop new tools that others in the field consider valuable.

“The learning curve is both bigger and steeper now,” says George Asimenos, director of strategic projects at DNAnexus in Mountain View, California. DNA sequencing was relatively slow and expensive when he started graduate school 13 years ago. As speed has gone up and costs have come down, demands on bioinformaticians have grown. They need to be comfortable with mining much larger data sets, and looking for relationships between them.

George Asimenos Credit: DNANEXUS

Asimenos had to first get comfortable with the language of biology before he could dive into the data. He remembers hearing terms like “3 Prime” and “downstream” and thinking “What do these things even mean? How do I find out?”

He gave himself a crash course by reading textbooks, going to conferences and hanging out with biologists. “I had to overcome the vocabulary barrier,” he says. During his undergrad, Asimenos had taken courses in statistics, engineering and computer science. Acquiring those skills earlier gave him time during grad school to bone up on his biology, literally, with an anatomy class that included human dissection; and figuratively, with stints in molecular biology wet labs.

Still, he had some fraught moments. His advisor tapped him to lecture a course on algorithms for biology, where Asimenos explained genetics, genomics and biology to a class of computer scientists. That experience in the biological deep end helped him, because DNAnexus makes software for biologists. He needs to know the language of the company's clients, so he can help create software to meet their needs. Bridging the gap between biology and computer science remains one of his biggest challenges. “That vocabulary is really deeply rooted in every single discussion,” he says. But even more skills are necessary as the technology improves. Knowledge in machine learning and artificial intelligence might be needed for the next generation of bioinformaticians, Asimenos says.

After graduate school at Lund University, Sweden, Jean-Baptiste Cazier translated his knowledge of applied mathematics into fluency in statistical genetics analysis at deCode Genetics in Iceland. He focused on how statistics can be used to find areas in the human genome that contribute to increased risk of disease. Now, as the director of the Centre for Computational Biology, University of Birmingham, he has been tasked with teaching bioinformatics to scientists and clinicians at the UK's National Health Service (NHS). Part of the country's 100,000 Genome Project — which aims to sequence and understand the genome of 100,000 patients — his centre was the first of eight to start this educational programme in October last year.

Bioinformatics can often equate to some form of programming. Although many people are initially intimidated by the prospect of learning programming languages, Cazier comforts them. His first lesson is to impart confidence. “Researchers — biologists, clinicians, whatever — can learn mathematics and programming,” Cazier says. To get researchers comfortable, he uses data from a few patients to show how mutations are identified, and then asks them if the mutations are new and if the information is statistically reliable. “I am talking to their research brains, so it works,” he says. boxed-text

Sibon Li Credit: EVELYN KIING

That breaks the ice for basic programming — especially when he demonstrates how code can help them ask and answer scientific questions. He has been surprised at the response. “I was quite worried about the teaching course, but they embraced it and asked for more,” says Cazier.

Vicky Schneider Credit: MATTHEW WAKEFIELD

Basic bioinformatics skills can empower biologists to make use of their own data: after all, they have the best understanding of biological processes. However, because the field is advancing so quickly, they need to keep in touch with the “hardcore” bioinformaticians to have any hope of keeping abreast with the latest developments.

Vicky Schneider, associate professor and deputy director of the EMBL Australia Bioinformatics Resource, at the University of Melbourne, Australia, says one way each side can learn the language of the other is by having more conversations. “You have to have a minimal common vocabulary,” Schneider says.

More dialogue between users and developers results in better tools, she says. For instance, a computer science-trained developer might create a powerful tool. But if someone with a biology background doesn't know how to use it, that tool is useless. “The two sides need to work together to develop user interfaces,” she says.

There are formal efforts in place to achieve this. For instance, the Global Organisation for Bioinformatics Learning, Education & Training (GOBLET) helps each side learn the others' science. But even that can only go so far. It is unrealistic for specialists from each side of the field to be completely fluent in the other's field. “Each side has to understand their own limitations,” Schneider says.

David Martin, a senior lecturer in bioinformatics at the University of Dundee, agrees with Cazier that biologists need more familiarity with bioinformatics. “The core skills for modern data-rich biology are not always there,” he says. If he could, he would teach every biology grad student enough skills so that they could do some basic programming, read data into a file, then be able to manipulate and process it — “not enough to be a computer scientist, but enough to have the tools to work with the data,” he says (see skills spectrum). But money and time is always a problem. “These skills take time to develop and craft, much like lab skills take time to develop and craft,” Martin says.

However, it can be done, if one is willing to put in the work, says Joseph Mullen, a PhD student at Newcastle University in the United Kingdom. After an undergraduate degree in biology and with little computational experience, he decided that the career opportunities bioinformatics would open up would be worth the effort. “It was an incredibly steep learning curve,” Mullen says. “I jumped into it with everything I had.” Indeed, between coursework, working three part-time jobs to fund his education and putting in the hours to learn multiple programming languages, he had time for little else. Mullen estimates he averaged about five hours of sleep during his MSc year.

The sacrifice paid off, though. The Engineering and Physical Sciences Research Council (EPSRC) and GlaxoSmithKline are funding his PhD research. In return, Mullen contributes to drug discovery work for the company. He has already been offered a government-funded joint postdoc position with Newcastle and Prozomix — a biotech company based in Northumberland, United Kingdom — even though he hasn't yet written up his dissertation.

Federico Abascal was similarly computationally illiterate when he completed his undergraduate degree in 1998. Now he works as a bioinformatician at the Wellcome Trust Sanger Institute in the United Kingdom — one of the world's most renowned bioinformatics hubs. When he finished his undergrad work, he remained interested in biology, but knew he didn't want to perform experiments.

He took a course in the programming language C, then went on to graduate school at the Spanish National Biotechnology Centre in Madrid. His drive to solve problems in evolutionary biology led him to learn more programming languages. “Once you know one language, it is easier and easier to learn others.”

He advises would-be bioinformaticians to get out of the lab as much as they can to avoid losing perspective. “In my case, the best ideas never come in front of a computer,” Abascal says. “It's important to enjoy your job and be motivated: the best ideas come whilst I'm in bed, or walking my dog, or having a coffee with colleagues.”

The next generation of bioinformatician may well find the lab and the computer indistinguishable. Dual training in both fields, as early as undergrad education, may well become the norm, says Atul Butte, director of the University of California, San Francisco's Institute for Computational Health Sciences.

Atul Butte Credit: ELISABETH FALL

Butte is a pioneer in training for bioinformatics. In high school, he was fascinated by National Geographic covers displaying MRI and CT scan images. He thought combining computers and medicine would prepare him for a career in radiology.

He pursued that career by enrolling in an eight-year programme at Brown University, studying medicine and computer science. Towards the end of his studies, gene expression microarray chips were invented, the human genome project took off, and the era of big data in biology was born. He emerged with a skill set training him in both worlds.

He may have been the exception then, but Butte sees dual training as the new norm. “More and more people come up with both,” he says.

In fact, it is harder now to get into top bioinformatics graduate programs conversant in only biology, or only computer science. “You have to demonstrate you know more than a little of both,” Butte says.

But knowing enough to use the software may not be enough to excel, Butte says. “The point of being in this field is to develop new tools, new methods — you have to innovate. You have to write new code.” Focusing too much on one technique or one problem could be career limiting, he says.

He advises constant learning — the amount of data keeps growing and the nature of it keeps changing. “Treat the field with respect,” he says. “If you want to thrive in biomedical informatics it can't be a casual thing. You have to be here to stay.”