The design of the proposed Convolutional Neural Network (CNN) architecture for face image recognition takes the constraints on the bandwidth of the communications between memory and processor into the account. The coarse grained parallelism which performed in the bottom layer node's calculations is reduced in consequent manner until the calculation of one simple node in the upper layer is achieved sequentially. Two methods of segmentation are used to buffer the image data required for these parallel to sequential calculations from the image RAM to multi-port RAMs. A comparison between these two methods with respect to the whole number of RAM access required to generate the system recognition code is performed. A speedup of 44 is achieved when the hardware system is implemented with the using of the 1st method of segmentation as compared to a Pentium 4, 2.4 GHz sequential computer software implementation. While a speedup of 88 is achieved when the same hardware system is implemented but with the using of the 2nd segmentation method, compared to the same mentioned sequential computer.
Keywords: convolution neural networks, parallel processing, memory architecture.