Speakers: Sofia Robb, PhD

Sofia Robb, PhD

Genomic Scientist
Stowers Institute for Medical Research


Sofia Robb graduated with a BS degree in biology from the University of Maryland, Baltimore in 1999. She began integrating scripting and the use of databases with her experiments while working as a technician in the laboratory of Alejandro Sánchez Alvarado, PhD, in the Department of Embryology at the Carnegie Institute in Baltimore.

Robb remained with the Sánchez laboratory after its move to the University of Utah for her doctoral work where she studied histone modifying enzymes and their role in stem cells and regeneration in the planarian flatworm, Schmidtea mediterranean. Robb also constructed numerous genomic tools for this emerging non-model organism and continued integrating the genome with bioinformatic tools as a postdoctoral associate at the University of California Riverside with Jason Stajich, PhD, and Susan Wessler, PhD, studying active transposable elements in rice. At the Stowers Institute, Robb is currently involved in several genomics initiatives, including establishing a collection of tools for the analysis of genomes and genome-wide data of research organisms as well as ontology development and use.

In addition, Robb is one of the coordinators and instructors for the Cold Spring Harbor Laboratory Course, Programming for Biology. She has been involved with the course since she attended as a student in 1999.


Using Open-Source Software to Share Genomic Data

SIMRbase is a genome database and browser that houses genomic and transcriptomic data for a variety of organisms used in the research programs at Stowers Institute for Medical Research (SIMR). SIMRbase has intuitive tools for researchers and collaborators to interact with their genomic data and has an expandable infrastructure to easily add and maintain new organisms. SIMRbase provides interfaces for genome browsing, gene searches, managing manual gene annotation efforts, and for sequence similarity searches using BLAST. SIMRbase is constructed using a collection of open-source tools, Tripal, CHADO, JBrowse, and Apollo. Tripal, a Drupal module, is used to access data stored in CHADO. CHADO is a relational database schema that stores genes and associated data such as genomic location, homology, publications, sequence, and GO terms. Tripal creates customizable gene pages for genes with this related data. Tripal extension modules are used for loading MAKER gene annotations and precomputed BLAST output to CHADO, a NCBI-BLAST interface for BLAST searches against SIMRbase sequence databases, and gene keyword search tools. SIMRbase employs JBrowse for the genome browsers and Apollo for manual gene curation. Currently, SIMRbase with 80 genomes, is maintained by one bioinformatician.