首页 帮助中心 常见问题 Ubuntu 16.04+CUDA7.5+Caffe深度环境搭建
Ubuntu 16.04+CUDA7.5+Caffe深度环境搭建
时间 : 2024-08-30 11:31:08 编辑 : 华纳云 阅读量 : 125

  详细介绍在ubuntu 16.04下搭建CUDA7.5+Caffe深度环境过程讲解步骤。
  1.安装Ubuntu 16.04
  省略。不懂可以自行搜索,系统安装后安装必要的更新和工具。
  sudo apt update
  sudo apt-get upgrade
  sudo apt-get install vim
  sudo apt-get install cmake
  2.安装显卡驱动
  进入all setting->Software Update,更换英伟达361.42驱动,重启电脑,使用nvidia-smi测试是否成功。
  3.安装cuda
  (1)安装必要的依赖库
  ca-certificates-java
  default-jre
  default-jre-headless
  fonts-dejavu-extra
  freeglut3
  freeglut3-dev
  java-common
  libatk-wrapper-java
  libatk-wrapper-java-jni
  libdrm-dev
  libgl1-mesa-dev
  libglu1-mesa-dev
  libgnomevfs2-0
  libgnomevfs2-common
  libice-dev
  libpthread-stubs0-dev
  libsctp1
  libsm-dev
  libx11-dev
  libx11-doc
  libx11-xcb-dev
  libxau-dev
  libxcb-dri2-0-dev
  libxcb-dri3-dev
  libxcb-glx0-dev
  libxcb-present-dev
  libxcb-randr0-dev
  libxcb-render0-dev
  libxcb-shape0-dev
  libxcb-sync-dev
  libxcb-xfixes0-dev
  libxcb1-dev
  libxdamage-dev
  libxdmcp-dev
  libxext-dev
  libxfixes-dev
  libxi-dev
  libxmu-dev
  libxmu-headers
  libxshmfence-dev
  libxt-dev
  libxxf86vm-dev
  lksctp-tools
  mesa-common-dev
  openjdk-7-jre
  openjdk-7-jre-headless
  tzdata-java
  x11proto-core-dev
  x11proto-damage-dev
  x11proto-dri2-dev
  x11proto-fixes-dev
  x11proto-gl-dev
  x11proto-input-dev
  x11proto-kb-dev
  x11proto-xext-dev
  x11proto-xf86vidmode-dev
  xorg-sgml-doctools
  xtrans-dev
  libgles2-mesa-dev
  nvidia-modprobe
  build-essential
  (2)安装cuda-toolkit
  ① 安装cuda_7.5.18_linux.run
  sudo ./cuda_7.5.18_linux.run --override
  安装过程如下:
  Do you accept the previously read EULA? (accept/decline/quit): accept You are attempting to install on an unsupported configuration. Do you wish to continue? ((y)es/(n)o) [ default is no ]: y Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 352.39? ((y)es/(n)o/(q)uit): n Install the CUDA 7.5 Toolkit? ((y)es/(n)o/(q)uit): y Enter Toolkit Location [ default is /usr/local/cuda-7.5 ]: Do you want to install a symbolic link at /usr/local/cuda? ((y)es/(n)o/(q)uit): y Install the CUDA 7.5 Samples? ((y)es/(n)o/(q)uit): y Enter CUDA Samples Location [ default is /home/kinghorn ]: /usr/local/cuda-7.5 Installing the CUDA Toolkit in /usr/local/cuda-7.5 ... Finished copying samples. =========== = Summary = =========== Driver: Not Selected Toolkit: Installed in /usr/local/cuda-7.5 Samples: Installed in /usr/local/cuda-7.5
  ② 设置环境变量
  vi /home/xxx/.bashrc
  添加如下内容:
  export PATH=/usr/local/cuda/bin:$PATH
  执行如下命令使环境变量生效
  source /home/xxx/.bashrc
  将cuda动态库添加到动态库管理器
  sudo vi /etc/ld.so.conf.d/cuda.conf
  添加:
  /usr/local/cuda/lib64
  执行ldconfig使新加的库生效
  sudo ldconfig
  ③ 强制使用gcc5
  编辑/usr/local/cuda/include/host_config.h文件,注释掉115行
  #error -- unsupported GNU version! gcc versions later than 4.9 are not supported!
  改为:
  //#error -- unsupported GNU version! gcc versions later than 4.9 are not supported!
  (3)编译cuda例子与测试
  进入到/usr/local/cuda/NVIDIA_CUDA-7.5_Samples/1_Utilities/deviceQuery目录执行:
  sudo make
  ./deviceQuery
  4.安装cudnn库
  (1)解压
  tar xzvf cudnn-xxx-ga.tgz
  得到cuda文件夹里面含有的lib64和include两个文件夹
  (2)拷贝到cuda安装目录
  sudo cp cuda/cudnn.h /usr/local/cuda/include/
  sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
  注意:拷贝后将链接删除重新建立链接,否则,拷贝是多个多个不同名字的相同文件,链接关系参见cudnn解压后的文件夹。也可以分别拷贝每一个文,链接文件拷贝使用cp -d命令。
  5.安装opencv3.1.0
  (1)安装基本必要库
  sudo apt-get install build-essential cmake git libgtk2.0-dev pkg-config libavcodec-dev libavformat-dev libswscale-dev
  (2)配置opencv,生成Makefile
  cd opencv-3.1.0
  mkdir build
  cd build
  cmake -D CMAKE_BUILD_TYPE=Release -D CMAKE_INSTALL_PREFIX=/usr/local ..
  在configure过程中过程中,可能会出现下面的错误:
  – ICV: Downloading ippicv_linux_20151201.tgz…
  在直接下载该文件的过程中,会因为超时而失败,需要收到下载,将其拷贝至opencv-3.1.0/3rdparty/ippicv/downloads/linux-8b449a536a2157bcad08a2b9f266828b目录内,重新执行配置命令。
  (3)编译opencv
  make -j8
  此时可能会出现另一个错误:
  /usr/include/string.h: In function ‘void* __mempcpy_inline(void*, const void*, size_t)’: /usr/include/string.h:652:42: error: ‘memcpy’ was not declared in this scope return (char *) memcpy (__dest, __src, __n) + __n;
  这是因为ubuntu的g++版本过高造成的,只需要在opencv-3.1.0目录下的CMakeList.txt 文件的开头加入:
  set(CMAKE_CXX_FLAGS “${CMAKE_CXX_FLAGS} -D_FORCE_INLINES”)
  添加之后再次进行编译链接即可。
  (4)查看版本号
  pkg-config --modversion opencv
  (5)安装
  sudo make install
  6.安装caffe与配置
  (1)安装必要的依赖库
  sudo apt-get install build-essential
  sudo apt-get install libprotobuf-dev libleveldb-dev libsnappy-dev libopencv-dev libhdf5-serial-dev protobuf-compiler
  sudo apt-get install --no-install-recommends libboost-all-dev
  sudo apt-get install libatlas-base-dev
  sudo apt-get install Python-dev
  sudo apt-get install libgflags-dev libgoogle-glog-dev liblmdb-dev
  如果这些库都能顺利安装,会大大减少后面遇到的问题。
  (2)下载caffe-master并解压得到源码包
  解压:
  unzip caffe-master.zip
  (3)修改配置文件Make.config
  cd caffe-master
  cp Makefile.config.example Makefile.config
  vi Makefile.config
  将# USE_CUDNN := 1前得#注释去掉,表示使用cuDNN,如果不是使用GPU,可以将# CPU_ONLY := 1前得注释去掉。这里我使用cuDNN来加速。
  (4)编译caffe
  方法1:使用cmake编译
  mkdir build
  cd build
  cmake ..
  make all -j8
  这种方法一般不会出现问题。
  方法2:直接使用gcc编译
  make -j8
  错误1:
  src/caffe/net.cpp:8:18: fatal error: hdf5.h: No such file or directory
  cd /usr/lib/x86_64-linux-gnu
  sudo ln -s libhdf5_serial.so.10.1.0 libhdf5_serial.so
  sudo ln -s libhdf5_serial_hl.so.10.0.2 libhdf5_serial_hl.so
  修改Makefile.config
  INCLUDE_DIRS := $(PYTHON_INCLUDE) /usr/local/include /usr/include/hdf5/serial
  LIBRARY_DIRS := $(PYTHON_LIB) /usr/local/lib /usr/lib /usr/lib/x86_64-linux-gnu/hdf5/serial
  错误2:
  error -- unsupported GNU version! gcc versions later than 5.3 are not supported!
  目前caffe不支持高于5.3的gcc,理论上可通过对gcc,g++降级解决,但是降级后还会引起其他兼容性问题,因此并不能解决实际问题,下面附上降级方法。解决方法在后面。
  ① 安装低版本gcc、g++
  sudo apt-get install gcc-4.7 gcc-4.7-multilib
  sudo apt-get install g++-4.7 g++-4.7-multilib
  ② 设置优先级
  sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-4.7 40
  sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-5 50
  sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-4.7 40
  sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-5 50
  ③ 选择版本
  sudo update-alternatives --config gcc
  There are 2 choices for the alternative gcc (providing /usr/bin/gcc)
  Selection Path Priority Status ------------------------------------------------------------
  0 /usr/bin/gcc-5 50 auto mode
  * 1            /usr/bin/gcc-4.7   40        manual mode
  2 /usr/bin/gcc-5 50 manual mode
  sudo update-alternatives --config g++
  There are 2 choices for the alternative g++ (providing /usr/bin/g++).
  Selection Path Priority Status ------------------------------------------------------------
  0 /usr/bin/g++-5 50 auto mode
  * 1            /usr/bin/g++-4.7   40        manual mode
  2 /usr/bin/g++-5 50 manual mode
  错误3:
  /usr/include/string.h: In function ‘void* __mempcpy_inline(void*, const void*, size_t)’: /usr/include/string.h:652:42: error: ‘memcpy’ was not declared in this scope return (char *) memcpy (__dest, __src, __n) + __n;
  NVCCFLAGS += -ccbin=$(CXX) -Xcompiler -fPIC $(COMMON_FLAGS)
  改为:
  NVCCFLAGS += -D_FORCE_INLINES -ccbin=$(CXX) -Xcompiler -fPIC $(COMMON_FLAGS)
  错误3:
  /usr/bin/ld: cannot find -lippicv
  cp opencv-3.1.0/3rdparty/ippicv/unpack/ippicv_lnx/lib/intel64/libippicv.a /usr/local/lib
  再次编译即可。
  至此,gcc、g++降级完成。
  下面是错误2 的真正解决方法(红色字体):
  sudo vi /usr/local/cuda/include/host_config.h
  #if __GNUC__ > 5 || (__GNUC__ == 5 && __GNUC_MINOR__ > 3)
  #error -- unsupported GNU version! gcc versions later than 5.3 are not supported!
  修改为:
  #if __GNUC__ > 5 || (__GNUC__ == 5 && __GNUC_MINOR__ > 4)
  #error -- unsupported GNU version! gcc versions later than 5.4 are not supported!
  我的gcc版本为5.4.0,可根据具体情况修改。
  (5)编译caffe的python接口
  make pycaffe
  出错:
  python/caffe/_caffe.cpp:10:31: fatal error: numpy/arrayobject.h: No such file or directory
  原因是numpy路径配置错误将:
  PYTHON_INCLUDE := /usr/include/python2.7 \ /usr/lib/python2.7/dist-packages/numpy/core/include
  改为:
  PYTHON_INCLUDE := /usr/include/python2.7 \ /usr/local/lib/python2.7/dist-packages/numpy/core/include
  (6)测试caffe
  make runtest
  这个时间有点长。
  7.运行手写体例程
  caffe自带手写体识别的测试例子。每一步caffe都已经写好脚本,执行几个简单命令就可以将第一个深度学习程序跑起来。
  (1)获取数据(并完成数据标注)
  sh data/mnist/get_mnist.sh
  (2)将标签数据转换成caffe使用的LMDB数据格式
  sh examples/mnist/create_mnist.sh
  (3)网络求解文件修改
  vi caffe-master/examples/mnist/lenet_solver.prototxt
  # The train/test net protocol buffer definition
  net: "examples/mnist/lenet_train_test.prototxt"
  # test_iter specifies how many forward passes the test should carry out.
  # In the case of MNIST, we have test batch size 100 and 100 test iterations,
  # covering the full 10,000 testing images.
  test_iter: 100
  # Carry out testing every 500 training iterations.
  test_interval: 500
  # The base learning rate, momentum and the weight decay of the network.
  base_lr: 0.01
  momentum: 0.9
  weight_decay: 0.0005
  # The learning rate policy
  lr_policy: "inv"
  gamma: 0.0001
  power: 0.75
  # Display every 100 iterations
  display: 100
  # The maximum number of iterations
  max_iter: 10000
  # snapshot intermediate results
  snapshot: 5000
  snapshot_prefix: "examples/mnist/lenet"
  # solver mode: CPU or GPU
  solver_mode: GPU
  最后一行,训练过程采用CPU、GPU选择,如果不使用GPU,修改solver_mode: GPU为solver_mode: CPU即可,这里我使用GPU。
  (4)执行训练脚本
  sh examples/mnist/train_lenet.sh
  大约10分钟左右,模型训练完成
  I0716 14:46:01.360709 27985 solver.cpp:404]     Test net output #0: accuracy = 0.9908
  I0716 14:46:01.360750 27985 solver.cpp:404]     Test net output #1: loss = 0.0303895 (* 1 = 0.0303895 loss)
  I0716 14:46:01.360755 27985 solver.cpp:322] Optimization Done.
  I0716 14:46:01.360757 27985 caffe.cpp:222] Optimization Done.

华纳云 推荐文章
Ubuntu 18.04镜像如何安装Tomcat 8.5.31 CentOS 7.x下配置DNS服务器基本操作 Cobbler自动化安装部署Ubuntu和CentOS详细教程 SQL数据库中删除重复数据方法有那些 Linux中APT攻击特点和手段及如何防护 Linux中apt包管理器创建临时文件失败如何解决 windows server 2008服务器安全设置初级配置详细步骤 CentOS 8和RHEL8配置EPEL仓库具体有什么方法 Netlify部署一个网站的完整演示 如何服务器上搭配大模型docker+ollama+langchain工具运行环境
客服咨询
7*24小时技术支持
技术支持
渠道支持