学海网 文档下载 文档下载导航
设为首页 | 加入收藏
搜索 请输入内容:  
 导航当前位置: 文档下载 > 所有分类 > clustering of microarray gene expression data
免费下载此文档

clustering of microarray gene expression data

Motivation: Cluster analysis of gene expression profiles has been widely applied to clustering genes for gene function discovery. Many approaches have been proposed. The rationale is that the genes with the same biological function or involved in the same

BIOINFORMATICS

Geneexpression

ORIGINALPAPER

Vol.22no.72006,pages795–801doi:10.1093/bioinformatics/btl011

Incorporatinggenefunctionsaspriorsinmodel-basedclusteringofmicroarraygeneexpressiondata

WeiPan

DivisionofBiostatistics,MMC303,SchoolofPublicHealth,UniversityofMinnesota,Minneapolis,MN55455-0392,USA

ReceivedonOctober25,2005;revisedandacceptedonJanuary16,2006AdvanceAccesspublicationJanuary24,2006AssociateEditor:JohnQuackenbush

ABSTRACT

Motivation:Clusteranalysisofgeneexpressionprofileshasbeenwidelyappliedtoclusteringgenesforgenefunctiondiscovery.Manyapproacheshavebeenproposed.Therationaleisthatthegeneswiththesamebiologicalfunctionorinvolvedinthesamebiologicalprocessaremorelikelytoco-express,hencetheyaremorelikelytoformaclusterwithsimilargeneexpressionpatterns.However,mostexistingmethods,includingmodel-basedclustering,ignoreknowngenefunctionsinclustering.

Results:Totakeadvantageofaccumulatinggenefunctionalannota-tions,weproposeincorporatingknowngenefunctionsaspriorproba-bilitiesinmodel-basedclustering.Incontrasttoaglobalmixturemodelapplicabletoallthegenesinthestandardmodel-basedclustering,weuseastratifiedmixturemodel:onestratumcorrespondstothegenesofunknownfunctionwhileeachoftheotheronescorrespondingtothegenessharingthesamebiologicalfunctionorpathway;thegenesfromthesamestratumareassumedtohavethesamepriorprobabilityofcomingfromaclusterwhilethosefromdifferentstrataareallowedtohavedifferentpriorprobabilitiesofcomingfromthesamecluster.WederiveasimpleEMalgorithmthatcanbeusedtofitthestratifiedmodel.Asimulationstudyandanapplicationtogenefunctionpredictiondemonstratetheadvantageofourproposaloverthestandardmethod.Contact:weip@biostat.umn.edu

1INTRODUCTION

Thisarticleconcernswithclusteringgenesforgenefunctiondis-coveryusingmicroarraygeneexpressiondata.Ithasbeenwidelyobservedthatgeneswithasimilarfunctionorinvolvedinthesamebiologicalprocessarelikelytoco-express,henceclusteringgenes’expressionpro lesprovidesameansforgenefunctiondiscovery;see,e.g.Eisenetal.(1998),Brownetal.(2000),Wuetal.(2002),XiaoandPan(2005)andreferencestherein.However,mostexistingapproachesallignoreknownfunctionsofsomegenesintheprocessofclustering;fewexceptionsinthecontextofnon-model-basedclusteringincludeHanischetal.(2002),Chengetal.(2004),Fangetal.(2006)andHuangandPan(2006).Forexample,inmodel-basedclustering,allthegenesaretreatedequallyapriori;inparticular,allthegenesareassumedtohaveanequalpriorprobabilityofbeinginagivencluster(e.g.LiandHong,2001;GhoshandChinnaiyan,2002;Panetal.,2002).Asmentioned,ifsomegenesareknowntosharethesamefunction,itismorelikelythattheybelongtothesamecluster.Hence,itseemsmoreplausibletomodelthegenessharingthesamebiologicalfunctionto

haveanequalpriorprobabilitywhileallowingthegeneswithdif-ferentfunctionstohavevaryingpriorprobabilities.Thisprovidesamoreef cientwaytoaccountfortheassociationbetweengenefunctionandco-expression.Inthispaper,weproposesuchanapproachthatusesgenefunctionalannotationsaspriorsformodel-basedclustering.Speci cally, rst,thegenomeispartitionedintoseveralgroupswithonegroupcontainingthegenesofunknownfunctionandeachoftheothergroupscontainingthegenessharingthesamefunction.Genefunctionalannotationsarereadilyavailablefrommanyexistingdatabases,suchastheGeneOntology(GO)(Ashburneretal.,2000)andMIPS(Mewesetal.,2004).Second,eachgroupistreatedasastratumandastrati edmixturemodelisused:thegenesfromthesamegroupsareassumedtohavethesamepriorprobabilityofcomingfromthesameclusterwhilethepriorprobabilitiesfordifferentgroupsareallowedtobeunequal.Becauseofpossibleheterogeneityineachgenefunctionalgroup,wedonotassumethatthegenesfromthesamefunctionalgroupcomefromthesamecluster.Infact,forthegenesinthegroupofunknownfunction,theymaycomefromanycluster.Withrelativelyhighnoiselevelsofgenomicdata,itisrecognizedthatincorporatingbiologicalknowledgeintostatisticalanalysisisareliablewaytomaximizestatisticalef ciencyandenhancetheinterpretabilityoftheanalysisresults.

Thisarticleisorganizedasfollows.InSection2,we rstbrie yreviewthestandardmethodofmodel-basedclusteringwithaglobalmixturemodel,thenproposeourstrati edmixturemodelandasso-ciatedstrati edclustering.WederiveasimpleEMalgorithmto tourstrati edmodel.InSection3,wedemonstratetheadvantageofourproposalusingsimulateddata,andthenusingrealdataforgenefunctionprediction.Weendthepaperwithashortdiscussion.

22.1

METHODS

Standardmodel-basedclustering

Inmodel-basedclustering,itisassumedthateachobservationx,ap-dimensionalvector,isdrawnfroma nitemixturedistribution

fðx;QÞ¼

gXi¼1

pifiðx; iÞ

ð1Þ

withthepriorprobabilitypi,component-speci cdistributionfiandits

parameters i.WeuseQ¼{(pi, i):i¼1,...,g}todenoteallunknownparameters,withtherestrictionthat0 pi 1foranyiandthatPg

p¼1.Eachcomponentofthemixturedistributioncorrespondstoi¼1i

acluster.Thenumberofclusters,g,hastobedeterminedinpractice;seeSection2.3.

ÓTheAuthor2006.PublishedbyOxfordUniversityPress.Allrightsreserved.ForPermissions,pleaseemail:journals.permissions@http://doc.xuehai.net

795

第1页

免费下载Word文档免费下载:clustering of microarray gene expression data

(下载1-7页,共7页)

我要评论

相关文档

站点地图 | 文档上传 | 侵权投诉 | 手机版
新浪认证  诚信网站  绿色网站  可信网站   非经营性网站备案
本站所有资源均来自互联网,本站只负责收集和整理,均不承担任何法律责任,如有侵权等其它行为请联系我们.
文档下载 Copyright 2013 doc.xuehai.net All Rights Reserved.  email
返回顶部