NVIDIA GTC SV 2019 に参加しました Report from NVIDIA GTC SV 2019

thumb image

Last week I attended NVIDIA GPU Technology Conference (GTC) held at San Jose, CA on 17-21 March, 2019. This is a short report mainly focused on AI and HPC part of the conference, while the other parts are graphics and robotics (in line with Jensen Huang’s keynote).

I have attended GTC in Japan before. GTC in San Jose, called GTC SV (meaning Silicon Valley) for short, does have more sessions and more participants (reportedly 9000) than GTC in Japan. With so many sessions, many of which are conducted in parallel, it is hard to build up a personal schedule. If you look at the schedule of all the sessions of GTC SV attached below, you can see that up to almost 28 sessions, that include talks, instructor-led trainings and tutorials, were held on the same time.

先週、2019年3月17~21日にカリフォルニア州サンノゼで開催されたNVIDIA GPU Technology Confereceに出席しました。これは主にAIとHPCの部分に焦点を当てた手短かなレポートです。(Jensen Huang氏のキーノートスピーチによると、他の部分はグラフィックスとロボティクスです。)

私は以前、日本のGTCに参加したことがあります。 サンノゼのGTCは、略してGTC SV(Silicon Valley)と呼ばれ、日本のGTCよりもセッションと参加者(報告によると9000人)が多いです。多くのセッションは並行して行われていたので、個人的なスケジュールを組むのは困難でした。下に添付されているGTC SVのすべてのセッションのスケジュールを見ると、講演、講師によるトレーニング、チュートリアルなど、最大30弱のセッションが同時に開催されたことがわかります。

Grid schedule

There were not so many new hardware announcements. Worth mentioning is a new Jetson NANO, probably the cheapest computer with a GPU you can get: it only costs $99 in the US.

新しいハードウェアの発表はそれほど多くありませんでした。言及すべきなのは、新しいJetson NANO、おそらくGPUを備えた最も安いコンピュータです:米国で99ドルしかしません。

Jetson NANO

In contrast to the hardware, there were quite a lot of announcements of the software.
Data Science was one of the main keywords throughout the conference. With the purpose of taking all data science workflow (or “pipeline”) to GPU, NVIDIA is developing a suite of software libraries called RAPIDS. It also includes some of the software developed before, such as pydf.


End-to-end GPU data science pipeline also incorporates libraries being developed by other companies, such as CuPy and Numba.
The main idea of this pipeline is to do all data processing on GPUs without moving data between the host and GPU memory. For datacenters this GPU pipeline will also require networking hardware with support for RDMA, RoCE and GPUdirect, which provide data transfer among GPU memory of multiple GPUs on multiple nodes without using CPUs.
The pipeline is meant to be easy to use for those familiar with existing data science tools such as pandas and scikit-learn. For handling data in dataframes RAPIDS includes cuDF, which mimics pandas. For machine learning algorithms library RAPIDS has cuML which is used in place of scikit-learn.

numpy CuPy, Numba
pandas cuDF
Scikit-learn cuML
matplotlib ?
CPU vs GPU data science libraries

RAPIDS will also have visualization part in the future.
RAPIDS can work together with existing NN frameworks on top of cuDNN such as TensorFlow.

cuDF can seamlessly scale-up to multiple GPUs and scale-out to multiple nodes.
RAPIDS libraries are easy to install (I used a docker image, but there are other options too) and provide a multifold speedup compared to CPU libraries for large amounts of data. However, if you decide to use RAPIDS, please keep in mind that RAPIDS is still in beta and being actively developed.

Another piece of software that nicely fits into NVIDIA’s concept of end-to-end GPU data science pipeline is HOROVOD. It is a software framework for distributed training using TensorFlow, Keras, PyTorch or MXNet. It has been developed for a couple of years by now in Uber. It was not announced at GTC SV, but there were a few HOROVOD sessions. HOROVOD is doing multi-GPU multi-node training on top of MPI and NCCL 2 with support of RoCE and GPUDirect. It seems to be easy to integrate into existing Python code and is reportedly faster than native distributed training solutions in TensorFlow.
NVIDIAのエンドツーエンドGPUデータサイエンスパイプラインにうまくフィットするもう1つのソフトウェアとしてはHOROVODというのがあります。それは、TensorFlow、Keras、PyTorch、またはMXNetを使用して分散トレーニングのためのソフトウェアフレームワークです。Uberで既に2年間開発されています。HOROVODはGTC SVで発表されたわけではないのですが、いくつかのHOROVODセッションがありました。 HOROVODはRoCEとGPUDirectをサポートしてMPIとNCCL 2上でマルチGPUマルチノードトレーニングを可能にします。既存のPythonコードに統合するのは簡単そうで、TensorFlowのネイティブ分散トレーニングソリューションより速いという報告もあります。

At GTC I was presenting our poster: “Towards Estimating DNN Training Time and Cloud Cost”. We are conducting this research with the goal to be able to predict for arbitrary CNN application its training time on unseen GPU types within a wide range of mini-batch sizes.
This research, though still in early stage, drew much attention. One of the largest companies in deep learning field is conducting similar research according to one of this company employees.
私はGTCでポスターを発表しました: “Towards Estimating DNN Training Time and Cloud Cost”。私たちの研究は、さまざまなミニバッチサイズとunseen GPUの種類に対して、任意のCNNアプリケーションのトレーニング時間を予測できることを目的としています。

There were many other interesting posters presented at GTC SV. NVIDIA has kindly made them accessible online.
GTC SVで他にもたくさんの興味深いポスターが発表されました。 NVIDIAはポスターをオンラインでアクセスできるようにしてくれましたので、是非ご覧になってください。

Poster gallery

Next GTC in San Jose will be held on March 22 – 26, 2020
来年のGTC SV 2020は3月22〜26日に同じサンノゼで開催されます。