Abstract: This paper presents a new variant of the well-known Gaussian elimination for solving linear systems of equations (LSEs) over GF(2) and its highly efficient implementation in hardware. The proposed hardware architecture can solve any regular and (uniquely solvable) overdetermined LSE and is not limited to matrices of a certain structure. Its average running time for nxn matrices with uniformly distributed entries equals 2n (clock cycles) as opposed to about 1/4n^3 in software. Moreover, the average running time remains very close to 2n for random matrices with densities much greater or lower than 0.5. Our architecture has a worst-case time complexity of O(n^2) and also a space complexity of O(n^2). With these characteristics the architecture is particularly suited to efficiently solve medium-sized LSEs as they frequently appear in, e.g., cryptanalysis of stream ciphers. Besides solving LSEs over GF(2), the architecture at hand can also accomplish the computation of matrix inversion extremely fast. Using a similar architecture, matrix-by-matrix multiplication can be done in linear time and quadratic space. Thus, these architectures can possibly be used as building blocks of a more complex architecture for solving large LSEs by means of Strassen's algorithm. We realized our architecture for solving medium-sized LSEs on a contemporary low-cost FPGA. The implementation for a 50x50 LSE can be clocked with a frequency of up to $300$ MHz and computes the solution in 0.33 microseconds on average. BibTeX: @InProceedings{I-BMPPR06, author = {A. Bogdanov and M. Mertens and C. Paar and J. Pelzl and A. Rupp}, title = "{A Parallel Hardware Architecture for fast Gaussian Elimination over GF(2)}", booktitle = "{IEEE Symposium on Field-Programmable Custom Computing Machines --- FCCM 2006, Napa, CA, USA}", year = {2006}, month = {April} }