Angela at Rice

Friday, December 30, 2011

读夏本记

皋陶曰：‘都，亦行有九德，亦言其人有德，乃言曰：宽而栗、柔而立、愿而恭、乱而敬、扰而毅、直而温、简而廉、刚而塞、彊而义、彰厥有常，吉哉！’

皋陶说：“行事需要有九种品德。从处理事情开始，要做到宽大而严密，柔和而能决断，诚恳而恭敬，具有才干而又严谨，和顺而坚毅，正直而温和，直率而又有操守，刚直而实在，倔强而又合乎道义。发扬并长期坚持下去，就能办好事情了。”

读五帝本记

三皇：燧人氏、伏羲、神农氏

五帝：黄帝、颛顼(zhuanxu)、帝喾(ku)、唐尧、虞舜

燧人氏，是传说中发明钻木取火的人，这在先秦的古籍中已有记载。

伏羲氏，又称包牺氏、庖羲。据说他是个大发明家，对人民的贡献是很大的。“包牺氏始作八卦，以通神明之德，以类万物之情。”他还发明“结绳为网以渔”，造福于民。

神农氏，是传说中的炎帝。炎帝是中国的太阳神，又说他是农业之神，教民耕种，他还是医药之神，相传就是神农尝百草，创医学。传说神农死于试尝的毒草药。

黄帝，是传说中华夏民族的始祖。相传生于姬水，故以姬姓，居轩辕之丘，故号轩辕氏。国于有熊，亦称有熊氏。黄帝居于轩辕之丘，而娶于西陵之女，是为嫘祖为黄帝正妃，生二子，其后皆有天下：其一曰玄嚣，是为青阳，青阳降居江水；其二曰昌意，降居若水。昌意娶蜀山氏女，曰昌仆，生高阳，高阳有圣德焉。黄帝崩，葬桥山。其孙昌意之子高阳立，是为帝颛顼也。

颛顼，姬姓，是轩辕黄帝之孙，昌意之子，生于若水(今四川省渡口一带)，实居穷桑，七母女枢因感“瑶光”而生，十岁而佐少昊，二十而登帝位，初封高阳(今河北高阳县东)，都于帝丘(今濮阳县西南)。在位78年，寿98岁，号为高阳氏，列为五帝之一，是一位有文治之功的帝王。

帝喾(kù)，姓姬，是黄帝的曾孙，玄嚣之孙，父曰蟜极。帝喾有几个儿子在中国历史上也是很有名的。他的元妃姜原生了弃（即后稷），弃是周的始祖。次妃简狄生了契，契是商的祖先。次妃庆都生了尧，尧是历史上有名的圣贤之君、五帝之一。次妃常仪生了挚，挚继承了喾的帝位，九年后禅让给帝尧。

唐尧，姓尹祁，号放勋。因封于唐，故称唐尧。母为陈锋氏女庆都。

虞舜者，姓姬，名曰重华。重华父曰瞽叟，瞽叟父曰桥牛，桥牛父曰句望，句望父曰敬康，敬康父曰穷蝉，穷蝉父曰帝颛顼，颛顼父曰昌意：以至舜七世矣。自从穷蝉以至帝舜，皆微为庶人。

禹为夏，姓姒氏，禹父即治水无功的鲧。

契为商，姓子氏。

弃为周，姓姬氏。

Wednesday, December 14, 2011

InParanoid focuses on pairwise ortholog relationships. OrthoDB appreciates that the orthology concept is relative to different speciation points by providing a hierarchy of orthologs along the species tree. Other databases that provide eukaryotic orthologs include OrthoMaM for mammals, OrthologID and GreenPhylDB for plants. OrthoMCL has bacteria but old and incomplete.

Tree-based phylogenetic approaches aim to distinguish speciation from gene duplication events by comparing gene trees with species trees, as implemented in resources such as TreeFam and LOFT. A third category of hybrid approaches uses both heuristic and phylogenetic methods to construct clusters and determine trees, for example Ortholuge , EnsemblCompara GeneTrees and HomoloGene .

Orthology and paralogy, as originally defined by Fitch, are both evolutionary concepts. This is, orthologous genes are homologous sequences that started to diverge through a speciation event (the same with paralogs and duplication events). Consequently, the better you can approximate the evolution of such sequences, the better your orthology predictions will be.
In this respect, phylogenetic reconstruction is expected to provide you with the best evolutionary view. Therefore, by analyzing the phylogenetic trees (i.e. using tree reconciliation algorithms) it is possible to derive a collection of fine-grained predictions of all orthology relationship among sequences.
However, reconstructing gene phylogenies using the most modern and accurate methods is computationally very intensive (and they are not free of artifacts). As a consequence, this approach is prevented of being used at large scale if you do not count with enough computational power. Generally speaking, if your species of interest are available as precomputed predictions in any phylogeny-based database, is good to try. Otherwise, you can move to alternative methods based on pairwise sequence comparisons. These methods are faster and can usually cope with larger amounts of data.
There is also a third independent alternative that consist of inferring the evolution of genes (and therefore their relationships) based on other genomic features rather than their coding sequence. For instance, the YGOB database can be used to obtain orthology and paralogy predictions based on the gene order conservation among several species. This approach is usually considered as very reliable, and sometimes it is used as a golden-set for benchmarks.
Phylogeny-based analysis will be better choice if (among other reasons):

you are trying to predict orthology for a very intricate gene family, including many duplications, gene losses, etc.
you need a fine-grained distinction among, many-to-many, one-to-many and one-to-one relationships.
you need orthology and paralogy predictions among many species at the same time.
you want to know about gene losses.

-Note that phylogenetic trees are not perfect. They are not free of artifacts and they can lead (as other methods) to wrong predictions in the case of lineage sorting or horizontal gene transfer.-
Blast-based methods are much faster and provide good results. There are many tools that you can use to generate your own predictions. You will need to decide among them by considering their limitations and specific scope. For instance,

Do you need a very fast approach to find pairs of orthologs in many species? (Best Reciprocal Hits)
Is it crucial to differentiate one-to-one orthologs from sequences with in-paralogs? (InParanoid, COG, etc.)
Do you need cross relationships among more than two species? (MultiParanoid, orthoMCL)

-Note that many of these tools also provide precomputed data.-

An incomplete summary of resources:

(with special focus on phylogenetic based predictions)

Phylogeny based methods

MetaPhOrs (precomputed data): It combines predictions from many different databases and provide a consistency score for each orthology relationship. Useful to find highly reliable predictions. Data can be browsed interactively or downloaded from an FTP sever.
EnsemblCompara (precomputed data): Phylogeny based orthology and paralogy predictions. Ensembl bases its predictions in the analysis of gene family trees reconstructed using TreeBest (PhyML with fixed evolutionary model, DNA and protein analysis, slighted guided trees for better tree reconciliation).
PhylomeDB (precomputed data): It bases its predictions in a per-gene phylogenetic analysis (PhylML testing several evolutionary models and alignment timing and optimization). Note that, while Ensembl is a general purpose database, PhylomeDB is organized in "phylomes", which are genome wide collections of trees whose taxon sampling and analysis design is usually hypothesis driven. From the publication on Metaphors, PhylomeDB uses Metaphors to measure the reliability of their phylome-based predictions.
In general terms, Both Ensembl and phylomeDB tend to benchmark very similar (with good results) and they provide convenient API access to the DB and FTP downloads.
TreeFam (precomputed data): Similar to EnsemblCompara but it includes a set of manually curated trees. It seems to be discontinued, latest release dates from Feb 2009.
PHOG, analysis of precomputed phylogenies using a slightly different method.

Blast-based approaches

Inparanoid (precomputed data and standalone application): Predictions between pairs of species. It accounts for one-to-many and many-to-many relationships.
EggNOG (~COG) (precomputed data): Comprehensive catalog (630 species, including bacteria and archaea) of functionally annotated orthologs groups. An all-against-all blast comparison is used to build the orthologs groups. It accounts for in-paralogs.
OrthoMCL, MultiParanoid: Extensions of the previous methods. They add the possibility of generate predictions of several species at the same time.
Best Reciprocal hits (BRH): The simplest method. Still very useful when only the best orthologus pairs between two species are required.

Some benchmarks (among others)

http://www.plosone.org/article/info:doi/10.1371/journal.pone.0018755
http://genomebiology.com/2007/8/6/R109 (figure 4)
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1838432/
http://nar.oxfordjournals.org/content/37/suppl_2/W84.full (Figure 1)
http://nar.oxfordjournals.org/content/early/2010/12/11/nar.gkq953.full (Figure 3)

Some comments
1) Definition of orthologs. Fitch's definition is the most widely accepted. IMO, it is also more precise and evolutionarily meaningful than the several alternatives. If you want to find orthologs, go for databases using such a definition (e.g. Ensembl, TreeFam and InParanoid).
2) In general, I prefer tree-based method, especially for mammalians and perhaps also vertebrates. With a tree you can visually tell if the inference makes sense, which is a huge advantage. Another advantage of tree-based methods over pairwise methods is that tree-based methods produce consistent results across species. For example, say A is a 1:1 ortholog of B and B is a 1:1 ortholog of C. In principle, A is a 1:1 ortholog to C (not true if not 1:1), but a pairwise method cannot always guarantee this.
3) However, tree-based methods are not necessarily better than other methods. Reconstructing trees is very difficult. It is quite possible to come up with a purely heuristic method to achieve better results.
4) For tree-based methods, it is important to build gene trees considering species tree, or try to fix the tree topology with the species tree as sort of a prior. Blindly building a gene tree (even using the best algorithm) and then do the standard reconciliation will give very bad inference.
5) Tree-based methods do not work well for bacteria due to the lack of a good species tree and LGT/HGT. LGT very rarely, if ever, happens to mammalians.
6) For mammalians, nucleotide trees tend to reflect the true evolution in comparison to protein trees. A paper is arguing a protein guided nucleotide alignment is the best for building trees. This is also my experiences. Ensembl/TreeFam are using that.
7) For primates and rodents, EnsemblCompara is probably the best choice. It may not be the most accurate, but should be good enough for most purposes. I usually do not like to take the results by combining predictions. It is good for method comparison, but leads to various artifacts that are hard to understand.

cited from http://biostar.stackexchange.com/questions/7591/what-is-the-best-method-to-find-orthologous-genes-of-a-species

Monday, December 12, 2011

How to password protect a website folder

create a .htaccess file inside the directory you want protected. You can use either the vi or pico editors on the supported systems mentioned above or ftp the file to this directory. If you are new to unix or know little about vi then I suggest you use the pico editor or ftp the .htaccess file. The command to edit with pico is "pico .htaccess". The .htaccess file should contain the following lines. The items in bold are things you will want to change depending on the location of the AuthUserFile and content of AuthName.

AuthUserFile /z/ric/secret/.htpasswd
AuthGroupFile /dev/null
AuthName "Ric's protected files"
AuthType Basic


require valid-user

The AuthName is what the user will see when they're prompted for a password - something to the effect of "Enter the username for Ric's Protected files". The AuthUserFile is location of the password file and should be not accessible with a url on the server for security reasons.

First cd to the directory that contains the password file. In this example the password file is called .htpasswd and is in the directory /z/ric/secret/ as indicated by the AuthUserFile file entry in the .htaccess file. For every username you want to add to the password file, enter the following. (the -c is only required the first time; it indicates that you want to create the .htpasswd file).

cd
   mkdir secret
   cd secret
   htpasswd -c .htpasswd pumpkin
     [ you're prompted for the password for pumpkin]
     [ if you have other users enter the following. Don't use the -c]
   htpasswd .htpasswd user2
   htpasswd .htpasswd user3

[cited from: http://www.colostate.edu/~ric/htpass.html]

Tuesday, August 30, 2011

How to get Yeast information

(1) From DIP -> file, download ScereCR20041003.tab
(2) Get all the edge informatio, with node identified in DIP
(3) on UniProt, first convert DIP to UniProt AC, then convert to CYGD
(4) get length information using CYGD from http://www.yeastgenome.org using batch download

Tuesday, August 23, 2011

绿茶和红茶的功效异同

红茶内的茶多酚含量通常低于绿茶，而咖啡碱含量却高于绿茶，

红茶偏温，在提神益思，消除疲劳，止泻温胃消食等功效强于绿茶；

绿茶所含维生素C和叶酸的量比红茶多，绿茶编凉，

在防治疾病，增强营养，消炎解毒等功效方面优于红茶。

茶是一个咖啡因的重要来源，每杯茶的咖啡因含量一般只有每杯咖啡的一半，

这与制茶工艺有关。特定品种的茶，例如红茶和乌龙茶，比其他茶的咖啡因含量高。

茶含有少量的可可碱以及比咖啡略高的茶碱。

茶的制作对于茶有很大影响，但是茶的颜色几乎不能指示咖啡因的含量。

Saturday, August 13, 2011

About Cheese

前一阵对各种cheese感兴趣，做了点小研究：

Soft-fresh cheeses:

cottage: 简单，温和，欧洲经典cheese之一，由造butter剩的milk生产。有各种样式，容易消化，高蛋白。保质期短，几天后长霉就不能吃了。最好冷藏。和ricotta(高脂), pot cheese(较干), fromage blanc(低脂), buttermilk cheese, yogurt cheese及tofu类似。
feta: 盐水浸过，原产希腊。传统的是由绵羊奶制成。现在多用牛奶。白色，坚实但易碎，有小洞和裂缝，味道强烈，富有咸味。
mascarpone: 是意大利甜点tiramisu和zabaglione的主要材料之一。非常软，蓬松。微酸，比较贵。原产意大利，名字来自西班牙语，意思“better than good”。
neufchatel: 味道和样子都和cream cheese很像，但是由milk而非cream制成。低脂，更湿润。易变质。用它做的cheesecake更易熟，易开裂。
ricotta: 原产意大利，由造mozzarella，provolone等qitacheese滤出的乳清制成，比cottage cheese较甜较滑，高钙。可以直接和水果一起吃，但是通常用于pasta dishes和甜点。美式版本加入了一些milk作为strecher。有低脂的。可用于cheesecake。易变质。
brocciu：乳清cheese，由山羊奶或绵羊奶制成，是lactose-rich的ricotta的替代品(lactose-free)。原产island of corsica。
chevre: goat cheese, 山羊奶制成。通常真空包装。内行通常倾向于paper-wrapped。

queso blanco: 一种很受欢迎的西班牙cheese，常用于砂锅炖菜或bean dishes。加热后不易变形。也适合fry或grill。queso para freir比它更好一点。
queso fresco: 墨西哥人喜欢用于soup，salads，casseroles，bean dishes。加热时变软但是不融化。
cas: 罗马利亚cheese
urda: 罗马cheese
mizithra: 希腊cheese
geitost: 挪威cheese

semi-soft cheeses

mozzarella: 不多的几种煮久或高温也不变rubbery或出油的cheese之一。是pizza和casseroles的关键原料。非常stretchy。

hard cheese

romano: 意大利浓味硬干酪，磨碎调味
pecorino: 羊乳干酪
parmigiano-reggiano: 意大利帕玛森干酪
stilton: 英国蓝纹干酪