Today I am blogging from day 3 of the UAB short course on sequencing.

Speaker 1: Xiangqin Cui – Experimental design

Additive model to correct for length and dinucleotide bias in RNA-seq: Zheng 2011

Poisson model for no biological replicates. For gene i, condition 1 follows Poisson (λi1), condition 2 ~ Poisson (λi2), use Wald test or LRT to compare if λi1 = λi2

Due to overdispersion, Negative Binomial can be used instead (Robinson / Smith).

Speaker 2: David Crossman – Analysis of NGS data using Galaxy

Galaxy is a free GUI-based tool for analyzing next-generation sequence data for people who don’t know command line and don’t know how to code.

Speaker 3: Shili Lin – 3D regulation of the genome

Hypothesis: proteins may promote a gene by binding at a site which is distant in 1D genomic sequence but near to the gene in the 3D organization of the DNA.

The imprinted lgf2/H19 locus is one well-characterized example [Court 2011]

Hi-C protocol by Lieberman-Aiden 2009

  • Cross-link DNA with formaldehyde.  Pieces of DNA near each other in 3D will get cross linked.
  • Cut with restriction enzymes
  • Fill ends and mark with biotin
  • Ligate strands to each other
  • Purify and shear
  • Sequence with paired-end to see which pieces of DNA got ligated to one another and therefore were near each other.
ChIA-PET protocol due to Fullwood 2009
  • Similar to Hi-C but adds an immunoprecipitation step
Uses:
  • Tumor/normal comparison – does chromatin structure change in cancer
  • Global long-range regulation mechanisms
Statistical approaches needed:
  • IFCalculator
  • HG  - hypergeometric – too many false positives
  • Bayesian analysis (BASIC) to filter out random collisons

Speaker 4: Jonas Almeida – cloud computing and the semantic web

Resources for cloud computing

  • imagejs.org - cloud-based ImageJ tools
  • altjs.org - for those who want to use browsers in order to MapReduce but don’t want to write JavaScript.

Resources for the semantic web