Release LLVM-VE 1.5.0

We have released LLVM-VE 1.5.0. From this release we have shifted to the mono repo as the official LLVM. The compiler is idential with LLVM-VE 1.4.0. It can be installed from ef_extra repository. % yum install https://sx-aurora.com/repos/veos/ef_extra/x86_64/llvm-ve-link-1.5.0-1.x86_64.rpm \ https://sx-aurora. [Read More]
llvm 

Deep Reinforcement Learning on SX-Aurora

Our partner has evaluated their deep reinforcement learning algorithm on SX-Aurora’s Vector Engine. The result is very impressive. SX-Aurora outperforms GPU(P100) system about two times. Their algorithm is based on “dueling double DQN” and solves 3D bin packing problems such as packing multiple products in boxes in a logistics center, loading packages into a truck, etc. Since they are using TensorFlow, it can run on VE with TensorFlow for VE without special modification. [Read More]

Release llvm-ve-1.3.0

We have released LLVM-VE 1.3.0. The rpm packages are avaible.

% yum install https://sx-aurora.com/repos/veos/ef_extra/x86_64/llvm-ve-link-1.3.0-1.x86_64.rpm \
              https://sx-aurora.com/repos/veos/ef_extra/x86_64/llvm-ve-1.3.0-1.3.0-1.x86_64.rpm

This release adds some intrinsic functions.

  • _vel_andm_MMMl, etc
  • _vel_insert/extract_vm512l/u
  • _vel_approx_*
  • _vel_pvfmk*

Full function list is available at https://sx-aurora-dev.github.io/velintrin.html.

llvm 

Performance of TensorFlow on SX-Aurora

We are investigating how SX-Aurora works on variouse ML applications. In this post, we would like to share the result of performance evaluation of three ML workloads using TensorFlow. The graphs show relative performance of training on CPU, GPU and VE in SX-Aurora. The left graph is simple CNN for image classification based on the example in Keras. We have used mnist dataset. As you know, GPU’s high peak computational performance works well for convolution layers, then V100 is the best. [Read More]

Release TensorFlow for SX-Aurora

We are pleased to announce the release of TensorFlow for SX-Aurora. This TF supports Vector Engin in SX-Aurora as a computing device. We have implemented some kernels for VE. Such kenrels are offloaded to VE for acceleration. We have also released: keras includes small modification for VE, vetfkernel includes implemetation of kernels for VE, and vednn is Vector Engine DNN Library. You can pip install prebuild packages to start to use TF on SX-Aurora. [Read More]

LLVM-VE rpm package

As we mentioned in the past post, the rpm package for llvm-ve is now available on VEOS yum Repository on the Web. You can install llvm-ve: % yum install https://sx-aurora.com/repos/veos/ef_extra/x86_64/llvm-ve-1.1.0-1.1.0-1.x86_64.rpm \ https://sx-aurora.com/repos/veos/ef_extra/x86_64/llvm-ve-link-1.1.0-1.x86_64.rpm The llvm-ve package is all in one package that includes llvm, clang and runtimes. Files are installed into /opt/nec/nosupport/llvm-ve-1.1.0. The llvm-ve-link makes a symlink from /opt/nec/nosupport/llvm-ve to there. You can compile your program like this. % /opt/nec/nosupport/llvm-ve/bin/clang -target ve-linux hello. [Read More]
llvm 

Ansible for Aurora

NEC provides RPM packages for SX-Aurora. If you feel its intallation is weird or not standard, visit VEOS yum Repository on the Web. Yes, yum repository for SX-Aurora is available! And we have created ansible scripts to setup SX-Aurora using this yum repository. You can setup Aurora by few commands. Instructions Install CentOS7.5. Clone https://github.com/sx-aurora-dev/aurora-ansible. Write inventory file for ansible. Run ansible-playbook -i hosts.yaml -u root -k playbooks/aurora.yaml to install VEOS. [Read More]

Faster data transfer by VE DMA

On VE, you can use usual read(2) and write(2) to accesse file system or to transfer data through socket. But its speed is about 1GB/s at maximum in our experience. When we want to accelerate data transfer between VE and CPU, we use VE DMA. Since VE DMA has its own API for DMA, we have to rewrite a program. But we can reach 10GB/s. Here is the result of our experiment. [Read More]
vedma 

Image Processing on Aurora

We have presented several examples of image processing on Aurora at SC18 Exhibition. Some image processing kernels are memory bandwidth intensive because they use random memory access, for example, to access the pixels they are interested in. Aurora with world highest memory bandwidth (1.2TB/s) can fit such kernels. Here is the performance comparison of some kernels from OpenCV. We have also ported image processing applications to Aurora. See the poster for details. [Read More]