21734 articles – 15570 Notices  [english version]
HAL : hal-00565540, version 1

Fiche détaillée  Récupérer au format
Versions disponibles :
Parameter selection for principal curves
Gérard Biau 1, 2, 3, Aurélie Fischer 2
INRIA project “CLASSIC” Collaboration(s)
(11/02/2011)

Principal curves are nonlinear generalizations of the notion of first principal component. Roughly, a principal curve is a parameterized curve in Rd which passes through the “middle” of a data cloud drawn from some unknown probability distribution. Depending on the definition, a principal curve relies on some unknown parameters (number of segments, length, turn. . . ) which have to be properly chosen to recover the shape of the data without interpolating. In the present paper, we consider the principal curve problem from an empirical risk minimization perspective and address the parameter selection issue using the point of view of model selection via penalization. We offer oracle inequalities and implement the proposed approaches to recover the hidden structures in both simulated and real-life data.
1 :  Laboratoire de Probabilités et Modèles Aléatoires (LPMA)
CNRS : UMR7599 – Université Pierre et Marie Curie [UPMC] - Paris VI – Université Paris VII - Paris Diderot
2 :  Laboratoire de Statistique Théorique et Appliquée (LSTA)
Université Pierre et Marie Curie [UPMC] - Paris VI
3 :  Département de Mathématiques et Applications (DMA)
CNRS : UMR8553 – Ecole normale supérieure de Paris - ENS Paris
Mathématiques/Statistiques

Statistiques/Théorie
Principal curves – Parameter selection – Model selection – Oracle inequality – Penalty calibration – Slope heuristics.
Liste des fichiers attachés à ce document : 
PDF
Parameter_selection_for_principal_curves.pdf(1.5 MB)