# How to Reduce Dimension With PCA and Random Projections?

@article{Yang2021HowTR, title={How to Reduce Dimension With PCA and Random Projections?}, author={Fan Yang and Sifan Liu and E. Dobriban and David P. Woodruff}, journal={IEEE Transactions on Information Theory}, year={2021}, volume={67}, pages={8154-8189} }

In our “big data” age, the size and complexity of data is steadily increasing. Methods for dimension reduction are ever more popular and useful. Two distinct types of dimension reduction are “data-oblivious” methods such as random projections and sketching, and “data-aware” methods such as principal component analysis (PCA). Both have their strengths, such as speed for random projections, and data-adaptivity for PCA. In this work, we study how to combine them to get the best of both. We study… Expand

#### 10 Citations

Precise expressions for random projections: Low-rank approximation and randomized Newton

- Computer Science, Mathematics
- NeurIPS
- 2020

This work exploits recent developments in the spectral analysis of random matrices to develop novel techniques that provide provably accurate expressions for the expected value of random projection matrices obtained via sketching, and enables precise analysis of these methods in terms of spectral properties of the data. Expand

Scaling Graph Clustering with Distributed Sketches

- Computer Science, Mathematics
- 2020 IEEE High Performance Extreme Computing Conference (HPEC)
- 2020

This work presents a method inspired by spectral clustering where matrix sketches derived from random dimension-reducing projections produce embeddings that yield performant clustering results given a fully-dynamic stochastic block model stream using both the fast Johnson-Lindenstrauss and CountSketch transforms. Expand

An Introduction to Johnson-Lindenstrauss Transforms

- Computer Science
- ArXiv
- 2021

Johnson–Lindenstrauss Transforms are powerful tools for reducing the dimensionality of data while preserving key characteristics of that data, and they have found use in many fields from machine… Expand

Sample canonical correlation coefficients of high-dimensional random vectors with finite rank correlations

- Mathematics
- 2021

Consider two random vectors r x P R and r y P R of the forms r x “ Az ` C 1 x and r y “ Bz ` C 2 y, where x P R, y P R and z P R are independent random vectors with i.i.d. entries of zero mean and… Expand

Local laws for multiplication of random matrices and spiked invariant model

- Mathematics
- 2020

High dimensional Haar random matrices are common objects in modern statistical learning theory. We consider the random matrix model $A^{1/2} UBU^* A^{1/2},$ where $A$ and $B$ are two $N \times N$… Expand

Sparse sketches with small inversion bias

- Computer Science, Mathematics
- COLT
- 2021

A framework for analyzing inversion bias is developed, based on the proposed concept of an $(\epsilon,\delta)$-unbiased estimator for random matrices, and a new sketching technique is proposed, called LEverage Score Sparsified (LESS) embeddings, which uses ideas from both data-oblivious sparseembeddings as well as data-aware leverage-based row sampling methods. Expand

The Product of Gaussian Matrices is Close to Gaussian

- Computer Science, Mathematics
- APPROX-RANDOM
- 2021

It is shown that, provided each di, i, satisfies di ≥ Cp · q, where C ≥ C0 for a constant C0 > 0 depending on r, then the matrix product G1G2 · · ·Gr has variation distance at most δ to a p × q matrix G of i.d. Expand

Dimensionality reduction, regularization, and generalization in overparameterized regressions

- Computer Science, Mathematics
- ArXiv
- 2020

It is shown that OLS is arbitrarily susceptible to data-poisoning attacks in the overparameterization regime -- unlike the underparameterized regime -- and that regularization and dimensionality reduction improve the robustness. Expand

Limiting Spectrum of Randomized Hadamard Transform and Optimal Iterative Sketching Methods

- Mathematics, Computer Science
- ArXiv
- 2020

The limiting spectrum and asymptotic freeness of random matrices are leveraged to obtain an exact analysis of iterative sketching methods for solving least squares problems and yield optimal step-sizes and convergence rates. Expand

Optimal Iterative Sketching with the Subsampled Randomized Hadamard Transform

- Computer Science
- 2020

This work studies the performance of iterative Hessian sketch for least-squares problems by leveraging and extending recent results from random matrix theory on the limiting spectrum of matrices randomly projected with the subsampled randomized Hadamard transform, and truncated Haar matrices. Expand

#### References

SHOWING 1-10 OF 103 REFERENCES

Asymptotics for Sketching in Least Squares Regression

- Computer Science, Mathematics
- NeurIPS
- 2019

The limits of the accuracy loss (for estimation and test error) incurred by popular sketching methods are found, and separation between different methods is shown, so that SRHT is better than Gaussian projections. Expand

Improved Approximation Algorithms for Large Matrices via Random Projections

- Mathematics, Computer Science
- 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06)
- 2006

The key idea is that low dimensional embeddings can be used to eliminate data dependence and provide more versatile, linear time pass efficient matrix computation. Expand

Randomized sketches of convex programs with sharp guarantees

- Mathematics, Computer Science
- 2014 IEEE International Symposium on Information Theory
- 2014

This work analyzes RP-based approximations of convex programs, in which the original optimization problem is approximated by the solution of a lower-dimensional problem, and proves that the approximation ratio of this procedure can be bounded in terms of the geometry of constraint set. Expand

Randomized Matrix Decompositions using R

- Computer Science, Mathematics
- Journal of Statistical Software
- 2019

This work presents the R package rsvd, and provides a tutorial introduction to randomized matrix decompositions, showing the computational advantage over other methods implemented in R for approximating matrices with low-rank structure. Expand

Linear dimensionality reduction: survey, insights, and generalizations

- Mathematics, Computer Science
- J. Mach. Learn. Res.
- 2015

This survey and generic solver suggest that linear dimensionality reduction can move toward becoming a blackbox, objective-agnostic numerical technology. Expand

An Algorithm for the Principal Component Analysis of Large Data Sets

- Computer Science, Mathematics
- SIAM J. Sci. Comput.
- 2011

This work adapts one of these randomized methods for principal component analysis (PCA) for use with data sets that are too large to be stored in random-access memory (RAM), and reports on the performance of the algorithm. Expand

Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions

- Mathematics, Computer Science
- SIAM Rev.
- 2011

This work surveys and extends recent research which demonstrates that randomization offers a powerful tool for performing low-rank matrix approximation, and presents a modular framework for constructing randomized algorithms that compute partial matrix decompositions. Expand

Permutation methods for factor analysis and PCA

- Mathematics
- 2017

Researchers often have datasets measuring features $x_{ij}$ of samples, such as test scores of students. In factor analysis and PCA, these features are thought to be influenced by unobserved factors,… Expand

Random‐projection ensemble classification

- Mathematics
- 2015

Summary
We introduce a very general method for high dimensional classification, based on careful combination of the results of applying an arbitrary base classifier to random projections of the… Expand

A Statistical Perspective on Randomized Sketching for Ordinary Least-Squares

- Mathematics, Computer Science
- J. Mach. Learn. Res.
- 2016

A framework for assessing algorithmic and statistical aspects of randomized sketching methods is developed; the statistical prediction efficiency (PE) and the statistical residual efficiency (RE) of the sketched LS estimator are considered; and upper bounds for several types of random projection and random sampling sketching algorithms are provided. Expand