learning, regularization, and optimization for computational biology
Transcript
learning, regularization, and optimization for computational biology
learning, regularization, and optimization for computational biology alessandro verri DIBRIS, Università di Genova a. verri (Università di Genova) sestri levante, september 11 1 / 23 summary motivation the problem the proposed framework results conclusions a. verri (Università di Genova) sestri levante, september 11 2 / 23 motivation learning through regularization: a fruitful approach the case of computational biology need of non-standard penalties for better modeling optimization comes to the rescue a. verri (Università di Genova) sestri levante, september 11 3 / 23 the problem metastatic neuroblastoma (NB) occurs in pediatric patients at stage 4S or stage 4 tumors of stage 4 NB contain several structural Copy Number Aberrations (less frequent than those in stage 4S) CNAs of stage 4 tumors are associated with highly aggressive disease genomic instability appears to play a critical role in the genesis of these tumors goal: use high-resolution array comparative chromosome hybridization of 190 metastatic NBs to build a tumorigenesis model a. verri (Università di Genova) sestri levante, september 11 4 / 23 array-based Comparative Genomic Hybridization (aCGH) a. verri (Università di Genova) sestri levante, september 11 5 / 23 Data example a. verri (Università di Genova) sestri levante, september 11 6 / 23 aCGH data analysis Data Analysis: example a. verri (Università di Genova) sestri levante, september 11 7 / 23 The proposed statistical model A dictionary based approach General Setting We are given S ∈ N samples y1 , y2 , . . . , yS with ys ∈ RL . We seek J simple atoms (βj )1≤j≤J which possibly give complete representation of all the samples, in the sense that ys u J X θjs βj ∀s = 1, . . . , S j=1 for some coefficients θjs ∈ R. P ŷs = Jj=1 θjs βj provides a smoothed version of ys which make alterations more easily detectable. the atoms (βj )1≤j≤J together with the coefficients (θs )1≤s≤S possibly indicates principal and shared alterations in the data. a. verri (Università di Genova) sestri levante, september 11 8 / 23 The proposed statistical model A dictionary based approach The starting point min θs ,βj s.t. J J S J 2 X X X 1 X kβj k1 + µ TV (βj ) θjs βj + λ ys − 2 s=1 S X j=1 i=1 j=1 θjs2 ≤ 1, for all j = 1, . . . , J, and fixed λ > 0 and µ > 0 s=1 This model has been proposed in [?] the penalty λ kβj k1 + µTV (βj ) is called fused lasso and forces the solution atoms to be sparse and piece-wise constant. the hard constraints on the coefficients θ·j are imposed for consistency and identifiability of the model. chromosomes are analyzed one-by-one. As a consequence different family of atoms are obtained for different chromosomes. a. verri (Università di Genova) sestri levante, september 11 9 / 23 The proposed statistical model A dictionary based approach The proposed model I S min θs ,βj J 2 X 1 X θjs βj ys − 2 s=1 +λ i=1 J X kβj k21 j=1 s.t. 0 ≤ θjs ≤ 1 , +µ J X TVw (βj ) + τ S X kθs k21 s=1 j=1 ∀j = 1, . . . , J weighed total variation TVw (βj ) = PL−1 l=1 wl |βl+1,j − βl,j |; coefficients are constrained to be positive: θjs ≥ 0. structured sparsity along columns of the matrix of atoms [β1 , . . . , βJ ] and coefficients [θ1 , . . . , θS ] too. a. verri (Università di Genova) sestri levante, september 11 10 / 23 The proposed statistical model A dictionary based approach The proposed model II min θs ,βj J S J J S 2 X X X X 1 X θjs βj + λ kβj k21 + µ TVw (βj ) + τ kθs k21 ys − 2 s=1 s=1 i=1 j=1 j=1 s.t. 0 ≤ θjs ≤ 1 , ∀j = 1, . . . , J Improvements: TVw allows to relax at some points the constraint of “small variations” on the atoms. the signal of the entire genome can now be analyzed at once, still guaranteeing independency for each chromosome (at the end points of chromosomes the weights are set to zero). concomitant alterations occurring on different chromosomes can be identified. The coefficients are constrained to be positive. This reduces the complexity of the matrix [θls ] and forces the atoms βj to be more informative. a. verri (Università di Genova) sestri levante, september 11 11 / 23 The proposed statistical model A dictionary based approach General Setting Problem min B,Θ where g (B) = λ 1 kY − BΘk2F + h(Θ) + g (B) 2 J X kB(:, j)k21 +µ j=1 J X TVw (B(:, j)), j=1 h(Θ) = δ∆S×J (Θ) + τ S X kΘ(:, s)k21 s=1 and ∆ = [0, 1]. a. verri (Università di Genova) sestri levante, september 11 12 / 23 The proposed statistical model A dictionary based approach Convergence Theorem Problem min B,Θ 1 kY − BΘk2F + h(Θ) + g (B) 2 (1) defining the partial functions a. verri (Università di Genova) sestri levante, september 11 13 / 23 The proposed statistical model A dictionary based approach Convergence Theorem Problem min B,Θ 1 kY − BΘk2F + h(Θ) + g (B) 2 (1) defining the partial functions ϕB (Θ) = 1 2 kY − BΘk2F + h(Θ), a. verri (Università di Genova) ψΘ (B) = sestri levante, september 11 1 2 kY − BΘk2F + g (B) 13 / 23 The proposed statistical model A dictionary based approach Convergence Theorem Problem min B,Θ 1 kY − BΘk2F + h(Θ) + g (B) 2 defining the partial functions ϕB (Θ) = 12 kY − BΘk2F + h(Θ), ψΘ (B) = 1 2 (1) kY − BΘk2F + g (B) Theorem (Attouch et al. 2010) ηk , ζk ∈ [ρ1 , ρ2 ] Θk+1 ≈εn proxηk ϕB (Θk ) k Bk+1 ≈εn proxζk ψΘ k+1 (Bk ) 0 < ρ1 ≤ ρ2 < +∞. If εn = 0, then (Bk , Θk ) → (B̂, Θ̂) a stationary point of (1) a. verri (Università di Genova) sestri levante, september 11 13 / 23 The proposed statistical model A dictionary based approach Convergence Theorem Problem min B,Θ 1 kY − BΘk2F + h(Θ) + g (B) 2 defining the partial functions ϕB (Θ) = 12 kY − BΘk2F + h(Θ), ψΘ (B) = 1 2 (1) kY − BΘk2F + g (B) Theorem (Attouch et al. 2010) ηk , ζk ∈ [ρ1 , ρ2 ] Θk+1 ≈εn argminΘ ϕBk (Θ) + 2η1k kΘ − Θk k2F Bk+1 ≈εn argminB ψΘk+1 (B) + 2ζ1k kB − Bk k2F 0 < ρ1 ≤ ρ2 < +∞. If εn = 0, then (Bk , Θk ) → (B̂, Θ̂) a stationary point of (1) a. verri (Università di Genova) sestri levante, september 11 13 / 23 Numerical Experiments Data generation Signal model yls = µls + ls , µls = Ms X cms 1[lms ,lms +kms ] [l], ls ∼ N(0, σ 2 ), m=1 µ·s is the mean signal, σ is the standard deviation of the noise ls ; Ms is the number of segments (regions of CNVs) for sample s; cms , lms and kms are the height, starting position and length respectively for each segment. We choose: L = 1000, S = 20, σ ∈ {1, 2} Ms ∈ {1, 2, 3, 4, 5}, cms ∈ {±1, ±2, ±3, ±4, ±5}, lms ∈ {1, . . . , L − 100} and kms ∈ {5, 10, 20, 50, 100}. Signal model and numbers follow [?]. a. verri (Università di Genova) sestri levante, september 11 14 / 23 Numerical Experiments Data generation Synthetic Dataset This dataset is explicitly designed to mimic a real signal composed of different chormosomes. We build two classes of samples A and B and a third randomly built class. A chr1 chr2 chr3 chr4 chr3 chr4 B chr1 chr2 a. verri (Università di Genova) sestri levante, september 11 15 / 23 Numerical Experiments Data generation Synthetic Dataset This dataset is explicitly designed to mimic a real signal composed of different chormosomes. We build two classes of samples A and B and a third randomly built class. A chr1 chr2 chr3 chr4 chr3 chr4 B chr1 chr2 a. verri (Università di Genova) sestri levante, september 11 15 / 23 Numerical Experiments Parameters selection The Bayesian information criterion The choice of the parameters (J, λ, µ, τ ) is done according to the Bayesian information criterion (BIC) [?]. The BIC mitigates the problem of overfitting by introducing a penalty term for the complexity of the model. In our case the BIC is written as: (SL) · log kY − BΘk2 F SL + k(B) log(SL) and k(B) is computed as the number of jumps in B and ultimately depends on the parameters (J, λ, µ, τ ). After some standard normalizations, we choose: J ∈ {5, 10, 15, 20}, λ ∈ {10−4 , 10−3 , 10−2 , 10−1 }, µ ∈ {1, 5, 10}, τ ∈ {10−3 , 10−2 , 10−1 , 1}. a. verri (Università di Genova) sestri levante, september 11 16 / 23 Numerical Experiments Parameters selection ROC curves True Positive rate vs False Positive rate a. verri (Università di Genova) sestri levante, september 11 17 / 23 Numerical Experiments Parameters selection Estimated Y and B FFlat, [?] Solution 1 a. verri (Università di Genova) sestri levante, september 11 18 / 23 Numerical Experiments Parameters selection Estimated Y and B Our proposal, [?] Solution 1 a. verri (Università di Genova) sestri levante, september 11 18 / 23 Numerical Experiments Parameters selection Estimated Y and B FFlat, [?] Solution 2 a. verri (Università di Genova) sestri levante, september 11 18 / 23 Numerical Experiments Parameters selection Estimated Y and B Our proposal, [?] Solution 2 a. verri (Università di Genova) sestri levante, september 11 18 / 23 Numerical Experiments Parameters selection original and reconstructed aCGH data a. verri (Università di Genova) sestri levante, september 11 19 / 23 Numerical Experiments Parameters selection reconstructed coefficient matrices a. verri (Università di Genova) sestri levante, september 11 20 / 23 Numerical Experiments Parameters selection phylogenetic trees a. verri (Università di Genova) sestri levante, september 11 21 / 23 References S. Salzo and S. Villa Inexact and accelerated proximal point algorithms. Journal of Convex Analysis 19, No. 4, 1167–1192, 2012. S. Villa, S. Salzo, L. Baldassarre and A. Verri Accelerated and inexact forward-backward algorithms, SIOPT, 2013. S. Salzo, S. Masecchia, A. Verri,A. Barla Alternating Proximal Regularized Dictionary Learning. NECO 2014. Salvatore Masecchia, Simona Coco, Annalisa Barla, Alessandro Verri, Gian Paolo Tonini, Genome instability model of metastatic neuroblastoma tumorigenesis by a dictionary learning algorithm. Submitted. a. verri (Università di Genova) sestri levante, september 11 22 / 23 Thank you a. verri (Università di Genova) sestri levante, september 11 23 / 23