# Research

### Research Interest

Broadly I am interested in developing computationally efficient algorithms. These methods have become extremely relevant in today’s world of million sized datasets and massive parametrized learning models. More recently I have been working on sublinear algorithms for matrices, i.e., algorithms that run in time less than the number of inputs. Matrices are often used to represent data and parameters of learning models, and as such large datasets and complex learning models have lead to a requirement for efficient computational algorithms. The body of my work has been on pushing the boundaries of sublinear time or sublinear query algorithms in context of matrices and their applications. Applications include – model compression, approximate learning functions, fast search and recommendation systems, near-optimal deep nets, and approximate feature embeddings.

Prior to this I have dabbled unsuccessfully in computer vision, where I was working on detecting and recognizing texts in maps. I also worked on image synthesis in the kernel induced space during my masters.

### Papers

(author ordering for the papers below is alphabetical if marked *)

• Approximating Eigenvalues of Symmetric Matrices using Matrix-Vector Query Algorithms*
Cameron Musco, Archan Ray
in preparation.

• Universal Matrix Sparsifiers and Fast Deterministic Algorithms for Linear Algebra*
Rajarshi Bhattacharjee, Gregory Dexter, Cameron Musco, Archan Ray, Sushant Sachdeva, David P Woodruff
Innovations in Theoretical Computer Science (ITCS) 2024.
Abstract | Video from Rajarshi's talk at ITCS . Given a matrix $\mathbf A \in \mathbb{R}^{n \times n}$ which is normalized so that its entries are bounded in magnitude by $1$, it is well-known that if $S \subset [n] \times [n]$ is a uniformly random subset of $s= \tilde{O}(n/\epsilon^2)$ entries of $\mathbf A$, and if ${\mathbf A}_S$ equals $\mathbf A$ on the entries in $S$ and is zero on the entries outside of $S$, then $\|\mathbf A - \frac{n^2}{s} \cdot {\mathbf A}_S\|_2 \le \epsilon n$ with high probability, where $\|\cdot\|_2$ is the spectral norm. We show that for positive semidefinite (PSD) matrices, no randomness is needed at all in this statement. Namely, there exists a fixed subset $S$ of $s = \tilde{O}(n/\epsilon^2)$ entries that acts as a universal sparsifier: $\|\mathbf A - \frac{n^2}{s} \cdot {\mathbf A}_S\|_2 \le \epsilon n$ holds simultaneously for every bounded entry PSD matrix $\mathbf A \in \mathbb{R}^{n \times n}$. One can view this result as a significant extension of a spectral expander. Indeed, if we set $\mathbf A$ to be the all ones matrix, then ${\mathbf A}_S$ matches the near-optimal spectral expansion of Ramanujan graphs, up to logarithmic factors. But here our ${\mathbf A}_S$ sparsifies any bounded entry PSD matrix, not just the all ones matrix. We leverage the existence of such universal sparsifiers to give the first deterministic algorithms for several central linear algebraic problems, including singular value and singular vector approximation and positive semidefiniteness testing, that run in faster than matrix multiplication time. This partially addresses a significant gap between randomized and deterministic algorithms for fast linear algebraic computation.
While our primary construction of universal sparsifiers is non-explicit, we also give an explicit polynomial time construction with $\tilde{O}(n/\text{poly}(\epsilon))$ entries. A key technique we introduce is diagonal-charging, which allows us to bound how concentrated the entries of the eigenvectors of a bounded entry PSD matrix can be on a small subset of coordinates. Our results also extend to give the first non-trivial universal sparsification bounds for non-PSD matrices. In this case, we show there exists a subset $S$ of $s = \tilde O(n/\epsilon^4)$ entries for which for any bounded entry matrix $\mathbf A$, we have $\|\mathbf A - \frac{n^2}{s} \cdot {\mathbf A}_S\|_2 \le \epsilon \cdot \max(n,\|\mathbf A\|_1)$, where $\|\mathbf A\|_1$ is the trace norm of $\mathbf A$. We prove that this is optimal up to an $\tilde O(1/\epsilon^2)$ factor. Finally, if $\mathbf A \in \{-1,0,1\}^{n \times n}$ is PSD, we show that a spectral approximation $\mathbf{\tilde A}$ with $\|\mathbf A - \mathbf{\tilde A}\|_2 \le \epsilon n$ can be obtained by deterministically reading $\tilde O(n/\epsilon)$ entries of $\mathbf A$. This improves on our result for general PSD matrices by a $1/\epsilon$ factor and is information-theoretically optimal up to a logarithmic factor in its sample complexity.
• Sublinear Time Eigenvalue Approximation via Random Sampling*
Rajarshi Bhattacharjee, Gregory Dexter, Petros Drineas, Cameron Musco, Archan Ray
extended abstract in International Colloquium on Automata, Languages, and Programming (ICALP) 2023, full paper in Algorithmica.
Abstract | Code | Slides from Cameron's talk at the Algorithms and Foundations for Data Science Workshop, NUS. We study the problem of approximating the eigenspectrum of a symmetric matrix $\mathbf{A} \in \mathbb{R}^{n \times n}$ with bounded entries (i.e., $\|\mathbf{A}\|_{\infty} \leq 1$). We present a simple sublinear time algorithm that approximates all eigenvalues of $\mathbf{A}$ up to additive error $\pm \epsilon n$ using those of a randomly sampled ${O}\left (\frac{\log^3 n}{\epsilon^3}\right ) \times O\left (\frac{\log^3 n}{\epsilon^3}\right )$ principal submatrix. Our result can be viewed as a concentration bound on the \textit{full} eigenspectrum of a random submatrix, significantly extending known bounds on just the top eigenvalue (the spectral norm). When $\mathbf{A}$ is sparse and rows of $\mathbf{A}$ can be efficiently sampled with probabilities proportional to their sparsity, we present an improved error bound of $\pm \epsilon \sqrt{\text{nnz}(\mathbf{A})}$, where $\text{nnz}(\mathbf{A})$ is the number of non-zero entries in $\mathbf{A}$. Even for the strictly easier problem of testing the existence of large negative eigenvalues, introduced by Bakshi, Chepurko, and Jayaram (FOCS '20), our results are the first ones that can take advantage of the sparsity of $\mathbf{A}$. From a technical perspective, our bounds require several new eigenvalue concentration and perturbation bounds for matrices with bounded entries. We complement our theoretical results with numerical simulations, which demonstrate the effectiveness of our algorithms in practice.
• Sublinear Time Approximation of Text Similarity Matrices
Archan Ray, Nicholas Monath, Andrew McCallum, and Cameron Musco
AAAI Conference on Artificial Intelligence (AAAI) 2022.
Abstract | Code We study algorithms for approximating pairwise similarity matrices that arise in natural language processing. Generally, computing a similarity matrix for $n$ data points requires $\Omega(n^2)$ similarity computations. This quadratic scaling is a significant bottleneck, especially when similarities are computed via expensive functions, e.g., via transformer models. Approximation methods reduce this quadratic complexity, often by using a small subset of exactly computed similarities to approximate the remainder of the complete pairwise similarity matrix.
Significant work focuses on the efficient approximation of positive semidefinite (PSD) similarity matrices, which arise e.g., in kernel methods. However, much less is understood about indefinite (non-PSD) similarity matrices, which often arise in NLP. Motivated by the observation that many of these matrices are still somewhat close to PSD, we introduce a generalization of the popular Nystrom method to the indefinite setting. Our algorithm can be applied to any similarity matrix and runs in sublinear time in the size of the matrix, producing a rank-$s$ approximation with just $O(ns)$ similarity computations. We show that our method, along with a simple variant of CUR decomposition, performs very well in approximating a variety of similarity matrices arising in NLP tasks. We demonstrate high accuracy of the approximated similarity matrices in the downstream tasks of document classification, sentence similarity, and cross-document coreference.
• Tight Coupling of Character, Word, and Place Recognition for End-to-End Text Recognition in Maps
Archan Ray, Aruni Roy Chowdhury, Yi Fung, Jerod Weinman, and Erik Learned-Miller
Technical Report, College of Information and Computer Sciences, University of Massachusetts, Amherst, MA, 2019.
Abstract Text recognition in maps is a special case of general text recognition that features some especially difficult challenges, including texts at extreme orientations, wide character spacings, complex text-like distractors, and unusual non-dictionary strings. Off-the-shelf OCR systems, and even sophisticated scene text recognition systems do not work satisfactorily on many map-text recognition problems. While many OCR and scene text systems have produced high quality results by considering detection, recognition, and error-correction as separate components, we believe that map text recognition can benefit immensely from the tight coupling of different components of an overall system. In particular, we present an end-to-end system for recognizing text in maps that uses strong coupling in two different ways. In the first, we train individual character detectors, and use these detections as inputs in a new word detection CNN architecture to improve word detection. We show dramatic increases in word detection accuracy for a strong baseline detection architecture. In the second contribution, we use a geographically-based lexicon to constrain our interpretations of initial detections. If the lexicon suggests that the word detection is either too short, we ""re-prime'' the word detector by inserting expected characters locations back into the word detector using a novel input mechanism. We then rerun the word detector using the additional character suggestions, giving a solid improvement in accuracy. We report end-to-end recognition results on a public map-text recognition benchmark.
• Historical map annotations for text detection and recognition
Archan Ray, Ziwen Chen, Ben Gafford, Nathan Gifford, Jagath Jai Kumar, Abyaya Lamsal, Liam Niehus-Staab, Jerod Weinman, and Erik Learned-Miller
Grinnell College, Technical Report, 2018.
Introduction This document describes a data set designed for testing the performance of text/graphics separation and character recognition algorithms on text in scanned historical map images. Thirty one maps from the nineteenth and early twentieth centuries (1866-1927) were chosen from nine atlases in the David Rumsey Map Collection. Most maps are of individual states, though some are regional and one is of the entire U.S.; many are engraved with occasional handwritten text. The original MrSid files are converted into uncompressed TIFF images. The images and annotations are available from the following URLs.
• U-PC: Unsupervised planogram compliance
Archan Ray, Nishant Kumar, Avishek Shaw, and Dipti Prasad Mukherjee
European Conference on Computer Vision (ECCV), 2018.
Abstract We present an end-to-end solution for recognizing merchandise displayed in the shelves of a supermarket. Given images of individual products, which are taken under ideal illumination for product marketing, the challenge is to find these products automatically in the images of the shelves. Note that the images of shelves are taken using hand-held camera under store level illumination. We provide a two-layer hypotheses generation and verification model. In the first layer, the model predicts a set of candidate merchandise at a specific location of the shelf while in the second layer, the hypothesis is verified by a novel graph theoretic approach. The performance of the proposed approach on two publicly available datasets is better than the competing approaches by at least 10%.

### Theses/Dissertations

• Estimation of image features representing facial emotions for emotion synthesis
Archan Ray
M. Tech Dissertation Series, Indian Statistical Institute, Kolkata, 2015.
Abstract We develop a method to estimate emotion-specific features on human face. Application of such a method include characterizing an emotion class and synthesis of emotions. The emotion specific features can also be used to study the statistical differences between two clusters, one facial expression images with no expressions and two facial expression images with some or maximum emotional content. Once the feature vectors are extracted from the input data, we classify the data and use the normal to the classifier to trace the changes that a facial expression image may undergo in different stages of an emotion. We use Support Vector Machines learning algorithm to construct an optimal classifier. In the result section we show that we are able to reduce the number of features by 66.36% as compared to the total number of pixels. We show that using these features and state-of-the-art methods to synthesize images, we improved SNR of the synthesized image by 13.20% and also improved the cluster measures between a cluster of no-expression images and a cluster of with-expression images.