We have done significant updates to the JupyterHub since we last posted. QUICKSTART: CLICK HERE to use our JupyterHub to go from reprocessed omic-data to publication figures in < 2 minutes We now offer an FTP site for bulk download of the reprocessed SkyMap data. We now offer a graphical user interface so that even … Continue reading [Update on SkyMap JHub] Go from querying >400,000 RNAseq profiles to analysis without coding
Author: Brian Tsui
Design rationale for SkyMap JupyterHub: How can a Jupyter notebook extract the expression levels or allelic read counts from > 400,000 sequencing runs in seconds?
This blog post covers some of the rationales that I put into when designing SkyMap, the project which involves making >400,000 sequencing runs accessible to everyone. This post could be informative to you when you are designing your next Big Data application in Bioinformatics. I have listed some of the problems that I faced and … Continue reading Design rationale for SkyMap JupyterHub: How can a Jupyter notebook extract the expression levels or allelic read counts from > 400,000 sequencing runs in seconds?
Computer vs. Human: From a computer nerd who went into bioinformatics
A lot of people ask me how I went from computer science to bioinformatics. Actually, the two fields aren’t that different. Computers store long-term data in a disk with 0’s and 1’s, while cells store long-term data in DNA as A, C, G, and T. Computers store transient data in the cache or RAM, while … Continue reading Computer vs. Human: From a computer nerd who went into bioinformatics
Getting results from Big Data without the Big Infrastructure problem: Cloud + Docker + Kubernete
Imagine what would happen if every day during lunch time, you had to consciously coordinate all of the steps in digestion: breaking down the food in your stomach, pushing the food through your intestines, and telling yourself to stop feeling hungry after eating. You would spend the entire day coordinating your digestive system! Eating is … Continue reading Getting results from Big Data without the Big Infrastructure problem: Cloud + Docker + Kubernete
The PhD versus Online Dating
A Ph.D. is very much like a marriage with your advisor, as suggested by this PHD Comics post: After all, the term “Ph.D.” stands for Doctor of Philosophy, where the word “philosophy” is composed of the Latin roots philo- (love) and -sophos ("wisdom."). So maybe there are some skills transferable from your love life to … Continue reading The PhD versus Online Dating
8 ways machine learning techniques can teach us about effective human learning
There have been many articles covering how the entire human race is going to be replaced by AI and machine learning. That I don’t know. However, machine learning is in many ways simply mimicking human learning, and I believe we can apply effective machine learning techniques to improve our own learning and education. Be open … Continue reading 8 ways machine learning techniques can teach us about effective human learning
Buying computing infrastructure vs adopting the Cloud
The recurrent question in the data-intensive workplace often revolves around which computing infrastructure to use. In the past four years as a bioinformatics Ph.D. student, I have both received and offered solicited and unsolicited advice regarding computing infrastructures using my prior experience in high-performance computing lab and current expertise in data analytics. This blog post … Continue reading Buying computing infrastructure vs adopting the Cloud
Preview on Skymap project: extracting allelic read count and expression profiles of >400,000 sequencing run into simple omic matrices
Github link: https://github.com/brianyiktaktsui/Skymap#quick-start-10min Motivation Pooling pre-processed data from public studies sucks! It takes time and way too much brain energy. When I first started in bioinformatics a couple years ago, I spent much of my time doing two things: 1.) cleaning -omics data matrices, e.g. mapping between gene IDs (HGNC, Ensembl, USCS, etc.) for pre-processed … Continue reading Preview on Skymap project: extracting allelic read count and expression profiles of >400,000 sequencing run into simple omic matrices
Brian Y Tsui’s blog
The blog is about some of my technical experiences in bioinformatics and data analysis. I hope that you might find something useful. Better yet, leave comments or send me an email (btsui@eng.ucsd.edu) so that I can learn from you too, and I will try my best to give a satisfactory reply within days. I am … Continue reading Brian Yik Tak Tsui’s blog