Students Explore Personal Genomics

Students from the local Kannapolis middle and High School had the unique opportunity to explore the human genome and learn about bioinformatics – the application of computer technology to biological information.

nowlan_student-small

Nowlan working with a student to find out what genetic risks he has inherited.

In 2012, three generations of my family and I had our genetic markers commercially sequenced. The students used this DNA data to identify who was related to who, what kinds of diseases I was most at risk for, and make new discoveries about my genetic inheritance.

ivory_student-small-nonAlt

Ivory helping a student answer questions about genetics and inheritance.

The purpose of the workshop was to drive home how new genetic technologies are increasingly being used, as well as to give the students experience using genomics software – Integrated Genome Browser. Thanks to Tanner Deal and Ivory Blakley for helping design and lead the workshop, and to Doug Vernon for organizing the “Scientist for a Day” program. For more information about the “Scientist for a Day” program, check out the story in the Independent Tribune.

Plant & Animal Genome Conference

The Loraine lab had a strong showing at the 2016 Plant and Animal Genome Conference. Dr. Loraine gave a talk on the draft blueberry genome and on using the Integrated Genome Browser. I gave a talk on using ProtAnnot – an app for IGB to visualize protein function. Check out the workshop page for more information.

With over 150 talks and more than 3,000 attendees, this year’s PAG had a lot to offer and everyone who attended from the Loraine Lab had a great time. And of course we also enjoyed the local San Diego attractions.

Panda in the early morning at the San Diego Zoo

 

2015 Plant Biology Meeting

The Loraine lab traveled to the Minneapolis convention center for the 2015 American Society for Plant Biology meeting. There were many great talks given on recent advances in plant biology and crop sciences. Our own April Estrada gave a talk on the role of the gene SR45a in stress response in plants. I gave a talk on using IGB as a resource for teaching, as well as a workshop introducing visual analysis of RNA-seq data.

April giving her talk on SR45a

April giving her talk on SR45a

Everyone had a great time at the various talks, workshops, and exhibits. It was also a great chance to network with other researchers. Of course, we also made sure to take some time to visit the Twin Cities.

Enjoying Minneapolis

Enjoying Minneapolis

2015 Society for Developmental Biology

I want to thank the Society for Developmental Biology for inviting me to their annual meeting in Snowbird, Utah. I had the opportunity to give a talk on the work that April Estrada and I have done on the role of SR45a in alternative splicing in stress response. I also led a workshop on using Integrated Genome Browser to visually analyze high-throughput sequence data. We had a great turnout, as many of the attendees were very interested in using IGB in their work.

IGB

SDB attendees finding out more about IGB.

Snowbird is a ski resort located in the mountains near Salt Lake City. I was able to take the tram to the top of the mountain and take some photos. It was a great location for a conference.

mountain

View from top of Snowbird, looking out over Salt Lake City.

 

IGB workshop at 2015 SESDB

IGB made an appearance at the 2015 Southeast Regional Society for Developmental Biology (SESDB) conference at Clemson University. I led a workshop on visual analysis of sequencing data using IGB. There was a good turnout of conference attendees, as well as several students from Clemson University.

Following the workshop there were many exciting talks on current research in developmental biology. Some of my favorite talks were on regenerating hearts and spinal cords in fish, how hair cells develop, and how to use sharks to better understand brain development.

As a graduate of Clemson University, it was a lot of fun to return to Clemson, lead a workshop, and enjoy the company of many brilliant scientists.

image1

Everyone enjoying the final night’s food and festivities, including the band FNKY music.

4th Annual Catalyst Symposium

The Loraine lab had a strong showing at the 4th annual Catalyst Symposium, with Ivory, April, and I presenting posters on our current research.

The title of the symposium was “Progress in NCRC”. The theme was to highlight the highly diverse and interdisciplinary research being conducted across the North Carolina Research Campus. There were nine talks, eighteen posters, and over a hundred attendees. The talks and posters were very good, covering topics such as the role of obesity in promoting cancer and finding what genes control the taste of fruits and vegetables.

Ivory and April presented posters on their work in rice and Arabidopsis, respectively, while I presented the latest features in IGB. It was a lot of work to prepare for the symposium, but everyone had a great time and learned a lot.

IMG_3224IMG_3221IMG_3229

IGB at Lenoir-Rhyne University

In spring of 2015, Mason and I visited Dr. Scott Shaeffer’s genetics class at Lenoir-Rhyne University in Hickory, North Carolina.

The goal of our visit was was to teach students about genomics and genetics using Integrated Genome Browser. We also hoped to gain fresh insight into how new users respond to the IGB interface.

6

Lecture on genes and development.

To start, I gave a talk on developmental genetics, describing how a single mutation in a gene can have huge consequences for a developing embryo. I explained  how advances in technology have made sequencing genomes more affordable than ever, allowing researchers to quickly identify disease causing mutations.

2

Hands on training with IGB.

Students then worked hands-on with genomic data using IGB. Their first task was to find genetic mutations by exploring whole genome sequencing data. They then used the data to build coverage graphs, allowing them to find deletions. The final task was to explore my own personal 23andMe data using IGB to see if I was at risk for any genetic diseases.

After a quick demonstration, the students had no trouble visualizing the data in IGB. Also, we were happy to see that no-one had any trouble installing IGB on their personal laptops.

4

Mason and Dr. Schaefer answering questions.

This was the second time the IGB team has visited Lenoir-Rhyne. In 2013, Alyssa Gulledge visited another  of Dr. Schaeffer’s classes, which was using IGB to annotate the newly assembled blueberry genome.  At the time Mason was still attending school there, and this visit was his first exposure to genome visualization tools. IGB really stuck in Mason’s mind, and after graduating he joined the Loraine Lab as an intern, eventually taking over the role of lead tester on the IGB project.

We hope that this year’s visit will inspire other students to pursue careers in science and technology.

1

We hope everyone had fun learning about developmental genetics and IGB.

Using Table Browser at UCSC to get a data set for IGB

The UCSC Genome Bioinformatics site offers a wealth of data from many genomes, ranging from tiny (but deadly) genomes like ebola virus to much larger genomes like our own human genome. Indeed, the reference genome sequence and gene model data sets for many of the genomes the IGB teams hosts on the IGBQuickLoad.org data repository site are originally from UCSC. Other projects – like Galaxy – also rely on UCSC for core data sets. If you have used Galaxy, you may have noticed that Galaxy has built-in genomes for many animal species; these data were imported from the UCSC ecosystem.

When you visit a UCSC-supported genome in IGB, you’ll see a folder named “UCSC (DAS)” in the Available Data Sets  section of the Data Access Panel. If you open the folder, you’ll see many data sets with seemingly cryptic titles – like “nestedRepeats” and “altLocations.” These names correspond to tables in the UCSC database. If you select these data sets and click Load Data in IGB, these data will flow from the UCSC system’s Distributed Annotation Server into IGB.

DAS4HumanHowever, for technical reasons, the UCSC DAS site can only support some of their data sets. To view UCSC data that are not supported in their DAS site, you can use the UCSC Table Browser to download them onto your computer and open them in IGB.

In this post, I’ll explain how you can use the UCSC Table Browser to obtain and then open data sets in Integrated Genome Browser. I’ll also explain how you can use tabix and bgzip to compress and index files that are too large to load into IGB all at once.

Part I: Getting data from the UCSC Web site

In this example, I’ll show you how to get human variation data (SNPs) from the UCSC Web site.

  1. Go  to http://genome.ucsc.edu
  2. Click Tables (top of page)

Configure the browser to access the latest (as of this writing) human genome assembly:

  1. clade Mammal
  2. genome Human
  3. assembly Dec 2013 (GRCh38/hg38) (the latest)
  4. group Variation
  5. track Common SNPs(141)
  6. table snp141Common
  7. region genome
  8. output format BED – browser extensible data
  9. file type returned gzip compressed

Enter a name for the output file – e.g., snp141Common.hg38.bed.gz. It should end with “.gz” to ensure your computer will recognize it as a gzip-compressed file.

Tip: Click the button Describe Table Schema to see what data are in the table.

SNPTableBrowser

to be continued

Moving genomes source code to bitbucket (by Ann)

Today I’m moving source code (mainly python) from the genomes subversion repository over to a new repository on BitBucket.

The genomes subversion repository contains version-control data files for many different genomes that we’ve collected from many sources, such as the UCSC Genome  downloads page, the UCSC Table Browser, Phytozome (for plants), and model organism databases like DictyBase. We launched the genomes repository back in 2008 as part of an Arabidospis 2010 project grant that support IGB from 2008 to 2012. Our original idea was to use version control systems (cvs or svn) to track big data files from Arabidopsis and other species. The repository would not only serve as a useful data archive, it would also provide a way to feed data into IGB.

However, the genomes repo also contains source code we wrote for converting data files, setting up QuickLoad sites, and other tasks. The source code was stored in a directory named “src” (for “source”). You can browse it here: https://svn.transvar.org/repos/genomes/trunk/pub/src/.

Today, I’m moving all that source code over to a new repo hosted on BitBucket: https://bitbucket.org/lorainelab/genomes_src.

I’m doing this because the BitBucket UI for browsing code, reviewing changes, and managing the project is about a thousand times better than what we are currently using on our subversion server. Since BitBucket offers free source code hosting for reasonably sized repositories, there’s no point in hosting any of our source code ourselves, and during the last year, I’ve been happily migrating all our lab code onto BitBucket. Also, I’d like to start using git for version control, at least for all our source code. Setting up a git repository on our server would probably not be difficult, but then we would have yet another service to maintain.  Rather than try to host everything we do on our own machines (which consume electricity and sysadmin time) I’ve realized we should off-load as much as we can onto public, free resources like BitBucket. And at least for now, BitBucket seems like the best option because it’s run by Atlassian, which seems like it has staying power, a mature management structure, and excellent support.

More about the genomes repository:

The genomes repo contains a lot of data, not just code. In fact, most of is data – more than 11 Gb worth of sequence files, annotation files, and some meta-data files describing what’s there.

I’d love to migrate the data onto BitBucket and start using git and not svn for managing the data. I’d love to make it possible for people to fork the repo, improve the annotations, and then issue pull requests back to the main site. The problem is: I don’t know if BitBucket can handle a repository as big as ours – it’s 11 Gb. Also, most of our files are in binary formats (2bit, .tbi, .gz, etc) and I’m not sure how git will handle those files when I or other people make a change. Will it store multiple copies of an enormous file if I modify a small part of it? How would diffs work? It would be terrific if git could somehow handle our binary formats in a smart way.

So for now, I’ll continue to use our subversion server to host and version control the data sets, but I’ll use bitbucket for source code, which is probably what the team at Atlassian would prefer.

One last comment: For now, I’m leaving the “src” directory untouched in the original genomes repo, but I’ll add a notice letting users know that this part of the repository is moving to bitbucket.