The CA-variants for these problems require additional computation and bandwidth in order to update the residual vector. This thesis is focused on deriving communication-avoiding variants of the block coordinate descent method, which is a first-order method that has strong convergence rates for many optimization problems.
Furthermore we also experimentally confirm that our algorithms are numerically stable for large values of s. Furthermore, we were able to train the ImageNet dataset using the ResNet network with a batch size of up towhich would allow neural network training to attain a higher fraction of peak GPU performance than training with smaller batch sizes.
On modern computer architectures, the cost of moving data communication from main memory to caches in a single machine is orders of magnitude more expensive than the cost of performing floating-point operations computation.
While hardware improvements have facilitated the development of machine learning models in a single machine, the analysis of large amounts of data still requires parallel computing to obtain shorter running times or where the dataset cannot be stored on a single machine.
For CA-Krylov methods the reduction in communication cost comes at the expense of numerical instability for large values of s. In addition to hardware improvements, algorithm redesign is also an important direction to further reduce running times.
This thesis adapts well-known techniques from existing work on communication-avoiding CA Krylov and s-step Krylov methods. Our communication-avoiding variants reduce the latency cost by a tunable factor of s at the expense of a factor of s increase in computational and bandwidth costs for the L2 and L1 least-squares and SVM problems.
We apply a similar recurrence unrolling technique to block coordinate descent in order to obtain communication-avoiding variants which solve the L2-regularized least-squares, L1-regularized least-squares, Support Vector Machines, and Kernel problems.
On parallel machines the cost of moving data from one processor to another over an interconnection network is the most expensive operation. With this technique we have achieved speedups of up to 6.
In the parallel case, each iteration of block coordinate descent requires communication. For CA-kernel methods we show modeled speedups of 26x, x, and x for MPI on a predicted Exascale system, Spark on a predicted Exascale system, and Spark on a cloud system, respectively.
Therefore, avoiding communication is key to attaining high performance. This solution is then iteratively refined until the optimal solution is reached or until convergence criteria are met.
Finally, we also present an adaptive batch size technique which reduces the latency cost of training convolutional neural networks CNN. Our experimental results illustrate that our new, communication-avoiding methods can obtain speedups of up to 6.
Many problems in machine learning solve mathematical optimization problems which, in most non-linear and non-convex cases, requires iterative methods. For CA-kernel methods the computational and bandwidth costs do not increase.
CA-Krylov methods unroll vector recurrences and rearrange the sequence of computation in way that defers communication for s iterations, where s is a tunable parameter.
This is because the CA-variants of kernel methods can reuse elements of the kernel matrix already computed and therefore do not need to compute and communicate additional elements of the kernel matrix. Block coordinate descent is an iterative algorithm which at each iteration samples a small subset of rows or columns of the input matrix, solves a subproblem using just the chosen rows or columns, and obtains a partial solution.
The large gap between computation and communication suggests that algorithm redesign should be driven by the goal of avoiding communication and, if necessary, decreasing communication at the expense of additional computation.If you're not on campus, and you are not a UC Berkeley student, faculty or staff member, you may be able to access UC Berkeley dissertations for a fee from ProQuest's Dissertation Express or, for items in our collection, using our photoduplication services.
University of California, Berkeley Professor Michael A. Marletta, Chair Polysaccharide monooxygenases (PMOs) are a newly discovered and growing superfamily of secreted copper catalysts found in nature.
Dissertations - A High-Density Carbon Fiber Neural Recording Array Technology: Design, Fabrication, Assembly, and Validation Travis Massey [advisor: Kristofer Pister and Michel Maharbiz]. CBE Doctorate Degree Program & Requirements Doctor of Philosophy in Chemical Engineering The Ph.D.
program is designed to enlarge the body of knowledge of the student and, more importantly, to discover and develop talent for original, productive, and creative work in. Toggle navigation.
EECS at UC Berkeley. Main menu. About Toggle submenu for About. About Overview; By the Numbers; Diversity. After you have written your dissertation, formatted it correctly, assembled the pages into the correct organization, and obtained your signatures, you are ready to file it with UC Berkeley’s Graduate Division.
Step 1: Convert your dissertation in to a standard PDF file.Download