CLUSTERING HIGH DIMENSION, LOW SAMPLE SIZE DATA USING THE MAXIMAL DATA PILING DISTANCE

Cited 17 time in webofscience Cited 0 time in scopus
  • Hit : 186
  • Download : 0
DC FieldValueLanguage
dc.contributor.authorAhn, Jeongyounko
dc.contributor.authorLee, Myung Heeko
dc.contributor.authorYoon, Young Jooko
dc.date.accessioned2021-06-02T02:50:33Z-
dc.date.available2021-06-02T02:50:33Z-
dc.date.created2021-06-02-
dc.date.created2021-06-02-
dc.date.issued2012-04-
dc.identifier.citationSTATISTICA SINICA, v.22, no.2, pp.443 - 464-
dc.identifier.issn1017-0405-
dc.identifier.urihttp://hdl.handle.net/10203/285431-
dc.description.abstractWe propose a new hierarchical clustering method for high dimension, low sample size (HDLSS) data. The method utilizes the fact that each individual data vector accounts for exactly one dimension in the subspace generated by HDLSS data. The linkage that is used for measuring the distance between clusters is the orthogonal distance between affine subspaces generated by each cluster. The ideal implementation would be to consider all possible binary splits of the data and choose the one that maximizes the distance in between. Since this is not computationally feasible in general, we use the singular value decomposition for its approximation. We provide theoretical justification of the method by studying high dimensional asymptotics. Also we obtain the probability distribution of the distance measure under the null hypothesis of no split, which we use to propose a criterion for determining the number of clusters. Simulation and data analysis with microarray data show competitive clustering performance of the proposed method.-
dc.languageEnglish-
dc.publisherSTATISTICA SINICA-
dc.titleCLUSTERING HIGH DIMENSION, LOW SAMPLE SIZE DATA USING THE MAXIMAL DATA PILING DISTANCE-
dc.typeArticle-
dc.identifier.wosid000303963100001-
dc.identifier.scopusid2-s2.0-84863345232-
dc.type.rimsART-
dc.citation.volume22-
dc.citation.issue2-
dc.citation.beginningpage443-
dc.citation.endingpage464-
dc.citation.publicationnameSTATISTICA SINICA-
dc.identifier.doi10.5705/ss.2010.148-
dc.contributor.localauthorAhn, Jeongyoun-
dc.contributor.nonIdAuthorLee, Myung Hee-
dc.contributor.nonIdAuthorYoon, Young Joo-
dc.description.isOpenAccessN-
dc.type.journalArticleArticle-
dc.subject.keywordAuthorHierarchical clustering-
dc.subject.keywordAuthorhigh dimension-
dc.subject.keywordAuthorlow sample size data-
dc.subject.keywordAuthormaximal data piling-
dc.subject.keywordAuthorsingular value decomposition-
dc.subject.keywordPlusVALUE DECOMPOSITION ANALYSIS-
dc.subject.keywordPlusGEOMETRIC REPRESENTATION-
dc.subject.keywordPlusEXPRESSION-
dc.subject.keywordPlusCLASSIFICATION-
dc.subject.keywordPlusSELECTION-
dc.subject.keywordPlusTUMOR-
Appears in Collection
IE-Journal Papers(저널논문)
Files in This Item
There are no files associated with this item.
This item is cited by other documents in WoS
⊙ Detail Information in WoSⓡ Click to see webofscience_button
⊙ Cited 17 items in WoS Click to see citing articles in records_button

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0