Ubuntu 16.04/18.04 + Nvidia GTX 970M + CUDA 10.00 + cuDNN + Tensorflow

Update and Upgrade Ubuntu System

sudo apt update
sudo apt upgrade

Verify CUDA-Capable GPU

lspci | grep -i nvidia

It should return something like GeForce 970M. Then goto cuda-gpu website to check compute capability, e.g. GeForce 970M -> 5.2.

Install Dependencies

sudo apt-get install build-essential 
sudo apt-get install cmake git unzip zip
sudo apt-get install python-dev python3-dev python-pip python3-pip

#Install Linux Kernel Header

uname -r

This command returns the current linux kernel version, e.g. 4.15.0-36-generic. Then to install linux header, do following:

sudo apt install linux-headers-$(uname -r)

Install NVIDIA CUDA 10.0

Remove previous cuda installation:

sudo apt purge nvidia*
sudo apt autoremove
sudo apt autoclean
sudo rm -rf /usr/local/cuda*

Install Cuda:

# For Ubuntu 16.04
sudo apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/7fa2af80.pub
echo "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64 /" | sudo tee /etc/apt/sources.list.d/cuda.list

# For Ubuntu 18.04
sudo apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub
echo "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 /" | sudo tee /etc/apt/sources.list.d/cuda.list


sudo apt-get update
sudo apt-get -o Dpkg::Options::="--force-overwrite" install cuda-10-0 cuda-drivers

Reboot System

So as to load the NVIDIA drivers.

Modify .bashrc and check installation

Update path in .bashrc:

echo 'export PATH=/usr/local/cuda-10.0/bin${PATH:+:${PATH}}' >> ~/.bashrc
echo 'export LD_LIBRARY_PATH=/usr/local/cuda-10.0/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}' >> ~/.bashrc


source ~/.bashrc
sudo ldconfig

It will show some results as below:

| NVIDIA-SMI 410.73       Driver Version: 410.73       CUDA Version: 10.0     |
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|   0  GeForce GTX 970M    Off  | 00000000:01:00.0 Off |                  N/A |
| N/A   28C    P0    22W /  N/A |    201MiB /  3022MiB |      1%      Default |

| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|    0      1447      G   /usr/lib/xorg/Xorg                            89MiB |
|    0      1639      G   /usr/bin/gnome-shell                         107MiB |

Install cuDNN 7.3.1

NVIDIA cuDNN is a GPU-accelerated library of primitives for deep neural networks. Goto nvidia-cudnn website and after login and accepting agreement, download cuDNN v7.3.1 Library for Linux [cuda 10.0].

Then goto download folder and in terminal perform following:

tar -xf cudnn-10.0-linux-x64-v7.3.1.20.tgz
sudo cp -R cuda/include/* /usr/local/cuda-10.0/include
sudo cp -R cuda/lib64/* /usr/local/cuda-10.0/lib64

Install NCCL 2.3.7

NVIDIA Collective Communications Library (NCCL) implements multi-GPU and multi-node collective communication primitives that are performance optimized for NVIDIA GPUs. Go to nvidia-nccl website and attend survey to download Nvidia NCCL v2.3.7, for CUDA 10.0 -> NCCL 2.3.7 O/S agnostic and CUDA 10.0

Goto downloaded folder and in terminal perform following:

tar -xf nccl_2.3.7-1+cuda10.0_x86_64.txz
cd nccl_2.3.7-1+cuda10.0_x86_64
sudo cp -R * /usr/local/cuda-10.0/targets/x86_64-linux/
sudo ldconfig

Install Dependencies

pip install -U --user pip six numpy wheel mock
pip3 install -U --user pip six numpy wheel mock
pip install -U --user keras_applications==1.0.5 --no-deps
pip3 install -U --user keras_applications==1.0.5 --no-deps
pip install -U --user keras_preprocessing==1.0.3 --no-deps
pip3 install -U --user keras_preprocessing==1.0.3 --no-deps

Install Tensorflow from Wheel

Download wheel file

If you have exact the same configuration as I do, you can directly download the wheel from here, and then install tensorflow with pip (skip below and see next section).

Build your own wheel

Download bazel

cd ~/Download
wget https://github.com/bazelbuild/bazel/releases/download/0.17.2/bazel-0.17.2-installer-linux-x86_64.sh
chmod +x bazel-0.17.2-installer-linux-x86_64.sh
./bazel-0.17.2-installer-linux-x86_64.sh --user
echo 'export PATH="$PATH:$HOME/bin"' >> ~/.bashrc

Then reload environment variables

souce ~/.bashrc
sudo ldconfig

Download Tensorflow Source and Configure

cd ~/Downloads
git clone https://github.com/tensorflow/tensorflow.git
cd tensorflow
git checkout r1.12

Then we need to configure a set of stuffs:

Please specify the location of python. [Default is /usr/bin/python]: /usr/bin/python3

Press enter two times, then:

Do you wish to build TensorFlow with Apache Ignite support? [Y/n]: Y

Do you wish to build TensorFlow with XLA JIT support? [Y/n]: Y

Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]: N

Do you wish to build TensorFlow with ROCm support? [y/N]: N

Do you wish to build TensorFlow with CUDA support? [y/N]: Y

Please specify the CUDA SDK version you want to use. [Leave empty to default to CUDA 9.0]: 10.0

Please specify the location where CUDA 10.0 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: /usr/local/cuda-10.0

Please specify the cuDNN version you want to use. [Leave empty to default to cuDNN 7]: 7.3.1

Please specify the location where cuDNN 7 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda-10.0]: /usr/local/cuda-10.0

Do you wish to build TensorFlow with TensorRT support? [y/N]: N

Please specify the NCCL version you want to use. If NCCL 2.2 is not installed, then you can use version 1.3 that can be fetched automatically but it may have worse performance with multiple GPUs. [Default is 2.2]: 2.3.7

Now we need compute capability which we have noted at step 1 eg. 5.2

Please note that each additional compute capability significantly increases your build time and binary size. [Default is: 5.2] 5.2
Do you want to use clang as CUDA compiler? [y/N]: N

Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]: /usr/bin/gcc

Do you wish to build TensorFlow with MPI support? [y/N]: N

Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native]: -march=native

Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]:N

Configuration finished

Build Tensorflow w/ Bazel

This step takes a long time (3-4 hours on average).

bazel build --config=opt --config=cuda //tensorflow/tools/pip_package:build_pip_package

add --config=mkl if you want Intel MKL support for newer intel cpu for faster training on cpu

add --config=monolithic if you want static monolithic build (try this if build failed)

add --local_resources 2048,.5,1.0 if your PC has low ram causing Segmentation fault or other related errors

if you got error like Segmentation Fault then try again it usually worked.

The bazel build command builds a script named build_pip_package. Running this script as follows will build a .whl file within the tensorflow_pkg directory:

bazel-bin/tensorflow/tools/pip_package/build_pip_package tensorflow_pkg

Then the whl file can be found in cd tensorflow_pkg.

With Virtualenv

Working in a virtual environment is usually a great idea to separate dependencies for different projects.

sudo apt install virtualenv

Then in the desired directory, create a virtual environment:

virtualenv tf-gpu -p /usr/bin/python3
source tf-gpu/bin/activate
pip3 install tensor*.whl

Verify Tensorflow installation

In the virtualenv (tf-gpu):


import tensorflow as tf
hello = tf.constant('Hello, TensorFlow!')
sess = tf.Session()

Or you can check current device:

from tensorflow.python.client import device_lib

def get_available_gpus():
    local_device_protos = device_lib.list_local_devices()
    return [x.name for x in local_device_protos if x.device_type == 'GPU']