Bioinformatics: Part IV

Image result for biogenesis of mirna

Today, proteins, miRNA, cancer mutations, and more databases were discussed. X-ray crystallography and nuclear magnetic resonance (NMR) are common methods used to determine protein structure. NCBI, PDB, and COSMIC-3D (druggability and mutation cluster features) are useful protein databases. miRNAs target 3′ UTR and restrict protein expression. They also regulate alternative splicing and mRNA methylation. shRNA is double-stranded and are research tools employed in gene knockouts. The biogenesis of mRNA is depicted in the diagram above. Targetscan and Mirtarbase are miRNA databases where you can view predicted miRNA targets. PhosphoSitePlus, Prosite, PFAM, CDART, Eukaryotic Linear Motif are additional protein databases with information on domains and post-translational modifications. Dep Map and Project Achilles report results on cancer cell proliferation when particular genes are knocked out by CRISPR.

I also presented research on my gene c3orf80, which is a single pass transmembrane protein with glycosylation at amino acid 57. Through analysis of microarrays, protein motifs, conserved domains, tissue expression, predicted transcription factors, knockouts in cancer cell lines, mutation distribution, post-translational modifications, and more, I deduced that it’s an oncogene involved in some kinds of pancreatic cancer, may be involved in cell differentiation (transcription factor STAT4 is involved in T helper cell differentiation along with the related STAT1), possibly could function in Notch pathway due to upregulation of JAG1 and c3orf80 (microarray data), and cervical cancer (HeLa S3 cells can grow in suspension when this gene is upregulated). It’s highest relative tissue expression is in the cerebral cortex in areas like the primary visual center; I’m not sure what it’s exactly doing there. It’s interesting to note that it has a conserved domain from sodium-potassium pumps; however, it can’t function as one as it only goes through the membrane once. Obviously, these are all hypotheses based on research through bioinformatics tools online, and lab work would be needed to confirm or debunk these theories. Nonetheless, bioinformatics is a powerful tool to guide scientists’ exploration in uncharted waters, and I certainly enjoyed the challenge!

 

Bioinformatics: Part III

Image result for mystery

I learned about experimental design and continued to research on the mystery gene. It’s okay to experiment without a hypothesis! Let the data tell you what to follow, not necessarily your hypothesis as then confirmation bias could become a problem. Additionally, a solid experimental design will attack the problem from several angles.

Additionally, we discussed about other powerful databases like ALGGEN, which can predict transcription factors based on sequences, and Human Protein Atlas, which has so many useful features, including clinical data, which can go a long way in understanding an unknown gene. The epigenetics talk was extremely interesting and could be one of the keys to cancer and at least partially accounts for the differences in twins. Methylation and acetylation are prime examples of epigenetics at work, which can be modified by what we eat and how we live, potentially impacting our progeny.

Some key genomic browsers included UC Davis KOMP, which reports knockouts in mice; UCSC Genome Browser; and COSMIC for exploring cancer mutations. DAVID, KEGG (pathways), STRING, and Biogrid (protein interactions) are more ways to layer your research.

In my case, some databases didn’t yield anything, and others pointed me in almost different directions. Does my gene play a role in cotransport, cell differentiation, neural disorders like ALS, pancreatic cancer, the mammalian Notch pathway, something else, or a multitude of these things in different places of the body? It’s a mystery! I’ll continue researching on the potential transcription factors, cancer mutations, potential protein interactions, and possible pathways to hopefully focus on one thing that’s backed by several layers of evidence. Bioinformatics projects often are a great way to start a project, which can be further tested out in the lab; while I might not take this project to the lab, I’m certainly enjoying researching this gene and fighting through these hurdles.

Bioinformatics: Part II

Image result for dna microarray

I learned about several more useful bioinformatics tools like NCBI and UniProt. I found DNA microarrays fascinating as they reveal gene expression of 15-20k genes in a day; all that data is seemingly impossible to analyze! The principles behind RNA-seq (although it’s more useful as it’s revealing about mutations/SNPs, etc.) are much more complicated. NCBI houses tons of DNA microarrays. BioGPS is another website to look at gene expression in groups of tissues.

We also talked about transcription factors where Eukaryotic Promoter Database can help you find the sequence of a gene operator and Motif Map is also dedicated to helping you find transcription factors for your gene. This is crucial as transcription factors can tell you quite a bit about gene regulatory activity and perhaps something about its function.

Moreover, GWAS studies are big-time as they reveal SNPs within the population that can be analyzed. SNPs serve as convenient markers for further investigation, significantly narrowing down the possibilities from the billions of base pairs in the genome to specific sites where changes in DNA at those or nearby sites may play a role in a disease. PheGenI (phenotype-genotype-integrator) is another place related to SNPs and other variants related to the gene of interest.

I used quite a bit of this information today and made some exciting discoveries. The gene I’m researching is possibly involved in cell differentiation and could let cervical cancer cells grow in suspension! My gene is expressed quite a bit in the nervous system and adipocytes (fat cells). 1 SNP that’s associated with people having Behcet’s Syndrome (rare disorder causing blood vessel disorder) may be near my gene (within 1000 base pairs); I will investigate further into that tomorrow. Moreover, I found a site that can take promoter sequences and churn out potential transcription factors, which will also be investigated later. As a side note, which I discovered last time, this gene has orthologs in many animals including chimpanzees and may have been present in the ancestor of chordates as I concluded from my BLAST results.

Bioinformatics: Part I

Image result for unknown membrane protein

I participated in a molecular biology crash course, listened to an introduction to cancer, and played around with NCBI and its databases like Pubmed and other sites like gene cards to find out more about the gene I’m researching (it has an unknown function).

I learned that it was a membrane protein with an alpha helix spanning the hydrophobic portion of the membrane. Additionally, I found that part of it was a conserved domain linked to sodium-potassium transport pump ATPase subunit gamma. My initial hypotheses are that it’s a cotransport protein or involved in some sort of signal transduction pathway, functioning as a receptor to a hydrophilic molecule which stimulates it.