The recent launch of ARM’s Ne10 framework introduces the possibility of increasing runtime per- formance of software running on ARM architectures; specifically the Cortex-A9 hardware, commonly found in many modern smartphones. Meanwhile there is a growing interest in computer vision and signal processing; as such, OpenCV (Open Computer Vision) has become a mainstream software framework with long term exposure within the software development community, it is complete with some 500+ commonly used image processing operations.
It is beneficial to examine whether OpenCV could profit from the utilisation of Ne10 when running on ARM architectures. Ne10 is capable of operating with gcc (GNU Compiler Collection) to perform auto-vectorisation; which could reduce execution time of many commonly used image processing operations. As a majority of Cortex-A9 enabled devices are running the Android platform, it is beneficial to examine how the JNI (Java Native Interface) can be used to tackle such auto- vectorisations. Consequently an in-depth analysis of auto-vectorisation using Ne10 and gcc for the Android JNI will be performed. The greater part of this article will discuss the quantitative results of such auto-vectorisations on OpenCV.
- OpenCV & optimisations with Ne10 for Video Stream Processing (31 July 2012)
Lighting up OpenCV with Ne10 and NEON (28 January 2013) – Linux Conference Australia (LCA2013)
Efficiency in image processing has always been of high importance, this is increasingly true when performed on embedded and mobile devices. A majority of mobile devices are using Advanced RISC Machines (ARM) processors but surprisingly single instruction multiple data (SIMD) optimisations for this architecture is not yet common. This is especially true when we consider open source frameworks and libraries running on these newer mobile ARM devices.
INTEL have recently used their Streaming SIMD Extensions 2 (SSE2) framework to improve efficiency of heavyweight functions within the Open Compute Vision (OpenCV) library. Unfortunately, the energy efficiency of INTEL processors means it is not the dominant architecture for mobile devices. Image processing on these devices is now common due to the increasing camera capabilities (where most modern smartphones contain an 8 Megapixel camera). We also see a majority of Android and iOS apps leveraging the OpenCV library to perform image processing. The OpenCV library has numerous computationally intensive operations where the use of SIMD is beneficial, INTEL has identified and remedied these bottlenecks using SSE2, however most mobile devices (which run ARM) will not benefit from it. This is of fundamental importance for a mobile architecture in which efficiency and battery life are the major concern.
An alternative to INTEL’s SIMD instruction sets (SSE and AVX) are ARM’s NEON intrinsic instruction set (released in 2009) and the Ne10 software framework (2012). NEON contains similar, but not identical, vector instructions to SSE and AVX. The Ne10 library provides a set of commonly used vector operations, with each vector operation function consisting of clusters of pre-rolled NEON intrinsics. Ne10 provides a higher level of abstraction, enabling C/C++ floating point arrays to be manipulated at a higher level, allowing for faster development time. But at what cost?
End-users are increasingly using their phones as media processing machines, but what can the app developer do to save the battery life of these devices? Using SIMD to improve computing efficiency which in turn reduced clock cycles is an attractive option!
During this session we will compare between SSE, NEON and Ne10 when considering SIMD optimisations to critical functions of the OpenCV library, we will also discuss speedup factors and ease of use (from a programmers perspective). We will also “briefly” examine auto-vectorisation and how different compilers stack-up. Finally we “might” have a demonstration of some simple image processing using our NEON-ised OpenCV framework for iOS and Android devices.