Ubuntu 16.04+CUDA7.5+Caffe深度环境搭建-华纳云

首页帮助中心常见问题 Ubuntu 16.04+CUDA7.5+Caffe深度环境搭建

Ubuntu 16.04+CUDA7.5+Caffe深度环境搭建

时间 : 2024-08-30 11:31:08 编辑 : 华纳云阅读量 : 364

　　详细介绍在ubuntu 16.04下搭建CUDA7.5+Caffe深度环境过程讲解步骤。
　　1.安装Ubuntu 16.04
　　省略。不懂可以自行搜索，系统安装后安装必要的更新和工具。
　　sudo apt update
　　sudo apt-get upgrade
　　sudo apt-get install vim
　　sudo apt-get install cmake
　　2.安装显卡驱动
　　进入all setting->Software Update，更换英伟达361.42驱动，重启电脑，使用nvidia-smi测试是否成功。
　　3.安装cuda
　　（1）安装必要的依赖库
　　ca-certificates-java
　　default-jre
　　default-jre-headless
　　fonts-dejavu-extra
　　freeglut3
　　freeglut3-dev
　　java-common
　　libatk-wrapper-java
　　libatk-wrapper-java-jni
　　libdrm-dev
　　libgl1-mesa-dev
　　libglu1-mesa-dev
　　libgnomevfs2-0
　　libgnomevfs2-common
　　libice-dev
　　libpthread-stubs0-dev
　　libsctp1
　　libsm-dev
　　libx11-dev
　　libx11-doc
　　libx11-xcb-dev
　　libxau-dev
　　libxcb-dri2-0-dev
　　libxcb-dri3-dev
　　libxcb-glx0-dev
　　libxcb-present-dev
　　libxcb-randr0-dev
　　libxcb-render0-dev
　　libxcb-shape0-dev
　　libxcb-sync-dev
　　libxcb-xfixes0-dev
　　libxcb1-dev
　　libxdamage-dev
　　libxdmcp-dev
　　libxext-dev
　　libxfixes-dev
　　libxi-dev
　　libxmu-dev
　　libxmu-headers
　　libxshmfence-dev
　　libxt-dev
　　libxxf86vm-dev
　　lksctp-tools
　　mesa-common-dev
　　openjdk-7-jre
　　openjdk-7-jre-headless
　　tzdata-java
　　x11proto-core-dev
　　x11proto-damage-dev
　　x11proto-dri2-dev
　　x11proto-fixes-dev
　　x11proto-gl-dev
　　x11proto-input-dev
　　x11proto-kb-dev
　　x11proto-xext-dev
　　x11proto-xf86vidmode-dev
　　xorg-sgml-doctools
　　xtrans-dev
　　libgles2-mesa-dev
　　nvidia-modprobe
　　build-essential
　　（2）安装cuda-toolkit
　　① 安装cuda_7.5.18_linux.run
　　sudo ./cuda_7.5.18_linux.run --override
　　安装过程如下:
　　Do you accept the previously read EULA? (accept/decline/quit): accept You are attempting to install on an unsupported configuration. Do you wish to continue? ((y)es/(n)o) [ default is no ]: y Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 352.39? ((y)es/(n)o/(q)uit): n Install the CUDA 7.5 Toolkit? ((y)es/(n)o/(q)uit): y Enter Toolkit Location [ default is /usr/local/cuda-7.5 ]: Do you want to install a symbolic link at /usr/local/cuda? ((y)es/(n)o/(q)uit): y Install the CUDA 7.5 Samples? ((y)es/(n)o/(q)uit): y Enter CUDA Samples Location [ default is /home/kinghorn ]: /usr/local/cuda-7.5 Installing the CUDA Toolkit in /usr/local/cuda-7.5 ... Finished copying samples. =========== = Summary = =========== Driver: Not Selected Toolkit: Installed in /usr/local/cuda-7.5 Samples: Installed in /usr/local/cuda-7.5
　　② 设置环境变量
　　vi /home/xxx/.bashrc
　　添加如下内容：
　　export PATH=/usr/local/cuda/bin:$PATH
　　执行如下命令使环境变量生效
　　source /home/xxx/.bashrc
　　将cuda动态库添加到动态库管理器
　　sudo vi /etc/ld.so.conf.d/cuda.conf
　　添加：
　　/usr/local/cuda/lib64
　　执行ldconfig使新加的库生效
　　sudo ldconfig
　　③ 强制使用gcc5
　　编辑/usr/local/cuda/include/host_config.h文件，注释掉115行
　　#error -- unsupported GNU version! gcc versions later than 4.9 are not supported!
　　改为：
　　//#error -- unsupported GNU version! gcc versions later than 4.9 are not supported!
　　（3）编译cuda例子与测试
　　进入到/usr/local/cuda/NVIDIA_CUDA-7.5_Samples/1_Utilities/deviceQuery目录执行：
　　sudo make
　　./deviceQuery
　　4.安装cudnn库
　　（1）解压
　　tar xzvf cudnn-xxx-ga.tgz
　　得到cuda文件夹里面含有的lib64和include两个文件夹
　　（2）拷贝到cuda安装目录
　　sudo cp cuda/cudnn.h /usr/local/cuda/include/
　　sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
　　注意：拷贝后将链接删除重新建立链接，否则，拷贝是多个多个不同名字的相同文件，链接关系参见cudnn解压后的文件夹。也可以分别拷贝每一个文，链接文件拷贝使用cp -d命令。
　　5.安装opencv3.1.0
　　（1）安装基本必要库
　　sudo apt-get install build-essential cmake git libgtk2.0-dev pkg-config libavcodec-dev libavformat-dev libswscale-dev
　　（2）配置opencv，生成Makefile
　　cd opencv-3.1.0
　　mkdir build
　　cd build
　　cmake -D CMAKE_BUILD_TYPE=Release -D CMAKE_INSTALL_PREFIX=/usr/local ..
　　在configure过程中过程中，可能会出现下面的错误:
　　– ICV: Downloading ippicv_linux_20151201.tgz…
　　在直接下载该文件的过程中，会因为超时而失败，需要收到下载，将其拷贝至opencv-3.1.0/3rdparty/ippicv/downloads/linux-8b449a536a2157bcad08a2b9f266828b目录内，重新执行配置命令。
　　（3）编译opencv
　　make -j8
　　此时可能会出现另一个错误：
　　/usr/include/string.h: In function ‘void* __mempcpy_inline(void*, const void*, size_t)’: /usr/include/string.h:652:42: error: ‘memcpy’ was not declared in this scope return (char *) memcpy (__dest, __src, __n) + __n;
　　这是因为ubuntu的g++版本过高造成的，只需要在opencv-3.1.0目录下的CMakeList.txt 文件的开头加入：
　　set(CMAKE_CXX_FLAGS “${CMAKE_CXX_FLAGS} -D_FORCE_INLINES”)
　　添加之后再次进行编译链接即可。
　　（4）查看版本号
　　pkg-config --modversion opencv
　　（5）安装
　　sudo make install
　　6.安装caffe与配置
　　（1）安装必要的依赖库
　　sudo apt-get install build-essential
　　sudo apt-get install libprotobuf-dev libleveldb-dev libsnappy-dev libopencv-dev libhdf5-serial-dev protobuf-compiler
　　sudo apt-get install --no-install-recommends libboost-all-dev
　　sudo apt-get install libatlas-base-dev
　　sudo apt-get install Python-dev
　　sudo apt-get install libgflags-dev libgoogle-glog-dev liblmdb-dev
　　如果这些库都能顺利安装，会大大减少后面遇到的问题。
　　（2）下载caffe-master并解压得到源码包
　　解压：
　　unzip caffe-master.zip
　　（3）修改配置文件Make.config
　　cd caffe-master
　　cp Makefile.config.example Makefile.config
　　vi Makefile.config
　　将# USE_CUDNN := 1前得#注释去掉，表示使用cuDNN，如果不是使用GPU，可以将# CPU_ONLY := 1前得注释去掉。这里我使用cuDNN来加速。
　　（4）编译caffe
　　方法1：使用cmake编译
　　mkdir build
　　cd build
　　cmake ..
　　make all -j8
　　这种方法一般不会出现问题。
　　方法2：直接使用gcc编译
　　make -j8
　　错误1：
　　src/caffe/net.cpp:8:18: fatal error: hdf5.h: No such file or directory
　　cd /usr/lib/x86_64-linux-gnu
　　sudo ln -s libhdf5_serial.so.10.1.0 libhdf5_serial.so
　　sudo ln -s libhdf5_serial_hl.so.10.0.2 libhdf5_serial_hl.so
　　修改Makefile.config
　　INCLUDE_DIRS := $(PYTHON_INCLUDE) /usr/local/include /usr/include/hdf5/serial
　　LIBRARY_DIRS := $(PYTHON_LIB) /usr/local/lib /usr/lib /usr/lib/x86_64-linux-gnu/hdf5/serial
　　错误2：
　　error -- unsupported GNU version! gcc versions later than 5.3 are not supported!
　　目前caffe不支持高于5.3的gcc，理论上可通过对gcc，g++降级解决，但是降级后还会引起其他兼容性问题，因此并不能解决实际问题，下面附上降级方法。解决方法在后面。
　　① 安装低版本gcc、g++
　　sudo apt-get install gcc-4.7 gcc-4.7-multilib
　　sudo apt-get install g++-4.7 g++-4.7-multilib
　　② 设置优先级
　　sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-4.7 40
　　sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-5 50
　　sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-4.7 40
　　sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-5 50
　　③ 选择版本
　　sudo update-alternatives --config gcc
　　There are 2 choices for the alternative gcc (providing /usr/bin/gcc)
　　Selection Path Priority Status ------------------------------------------------------------
　　0 /usr/bin/gcc-5 50 auto mode
　　* 1 /usr/bin/gcc-4.7 40 manual mode
　　2 /usr/bin/gcc-5 50 manual mode
　　sudo update-alternatives --config g++
　　There are 2 choices for the alternative g++ (providing /usr/bin/g++).
　　Selection Path Priority Status ------------------------------------------------------------
　　0 /usr/bin/g++-5 50 auto mode
　　* 1 /usr/bin/g++-4.7 40 manual mode
　　2 /usr/bin/g++-5 50 manual mode
　　错误3：
　　/usr/include/string.h: In function ‘void* __mempcpy_inline(void*, const void*, size_t)’: /usr/include/string.h:652:42: error: ‘memcpy’ was not declared in this scope return (char *) memcpy (__dest, __src, __n) + __n;
　　NVCCFLAGS += -ccbin=$(CXX) -Xcompiler -fPIC $(COMMON_FLAGS)
　　改为：
　　NVCCFLAGS += -D_FORCE_INLINES -ccbin=$(CXX) -Xcompiler -fPIC $(COMMON_FLAGS)
　　错误3：
　　/usr/bin/ld: cannot find -lippicv
　　cp opencv-3.1.0/3rdparty/ippicv/unpack/ippicv_lnx/lib/intel64/libippicv.a /usr/local/lib
　　再次编译即可。
　　至此，gcc、g++降级完成。
　　下面是错误2 的真正解决方法（红色字体）：
　　sudo vi /usr/local/cuda/include/host_config.h
　　#if __GNUC__ > 5 || (__GNUC__ == 5 && __GNUC_MINOR__ > 3)
　　#error -- unsupported GNU version! gcc versions later than 5.3 are not supported!
　　修改为：
　　#if __GNUC__ > 5 || (__GNUC__ == 5 && __GNUC_MINOR__ > 4)
　　#error -- unsupported GNU version! gcc versions later than 5.4 are not supported!
　　我的gcc版本为5.4.0，可根据具体情况修改。
　　（5）编译caffe的python接口
　　make pycaffe
　　出错：
　　python/caffe/_caffe.cpp:10:31: fatal error: numpy/arrayobject.h: No such file or directory
　　原因是numpy路径配置错误将：
　　PYTHON_INCLUDE := /usr/include/python2.7 \ /usr/lib/python2.7/dist-packages/numpy/core/include
　　改为：
　　PYTHON_INCLUDE := /usr/include/python2.7 \ /usr/local/lib/python2.7/dist-packages/numpy/core/include
　　（6）测试caffe
　　make runtest
　　这个时间有点长。
　　7.运行手写体例程
　　caffe自带手写体识别的测试例子。每一步caffe都已经写好脚本，执行几个简单命令就可以将第一个深度学习程序跑起来。
　　（1）获取数据（并完成数据标注）
　　sh data/mnist/get_mnist.sh
　　（2）将标签数据转换成caffe使用的LMDB数据格式
　　sh examples/mnist/create_mnist.sh
　　（3）网络求解文件修改
　　vi caffe-master/examples/mnist/lenet_solver.prototxt
　　# The train/test net protocol buffer definition
　　net: "examples/mnist/lenet_train_test.prototxt"
　　# test_iter specifies how many forward passes the test should carry out.
　　# In the case of MNIST, we have test batch size 100 and 100 test iterations,
　　# covering the full 10,000 testing images.
　　test_iter: 100
　　# Carry out testing every 500 training iterations.
　　test_interval: 500
　　# The base learning rate, momentum and the weight decay of the network.
　　base_lr: 0.01
　　momentum: 0.9
　　weight_decay: 0.0005
　　# The learning rate policy
　　lr_policy: "inv"
　　gamma: 0.0001
　　power: 0.75
　　# Display every 100 iterations
　　display: 100
　　# The maximum number of iterations
　　max_iter: 10000
　　# snapshot intermediate results
　　snapshot: 5000
　　snapshot_prefix: "examples/mnist/lenet"
　　# solver mode: CPU or GPU
　　solver_mode: GPU
　　最后一行，训练过程采用CPU、GPU选择，如果不使用GPU，修改solver_mode: GPU为solver_mode: CPU即可，这里我使用GPU。
　　（4）执行训练脚本
　　sh examples/mnist/train_lenet.sh
　　大约10分钟左右，模型训练完成
　　I0716 14:46:01.360709 27985 solver.cpp:404] Test net output #0: accuracy = 0.9908
　　I0716 14:46:01.360750 27985 solver.cpp:404] Test net output #1: loss = 0.0303895 (* 1 = 0.0303895 loss)
　　I0716 14:46:01.360755 27985 solver.cpp:322] Optimization Done.
　　I0716 14:46:01.360757 27985 caffe.cpp:222] Optimization Done.

上一篇：Ubuntu 18.04镜像如何安装Tomcat 8.5.31 下一篇：Windows系统如何设置缩短时间默认同步频率