利用 Google MT3 模型进行乐谱识别 - 本地篇

July 29, 2022 • 4 分钟

前言

最近经常看 Youtube 有很多的钢琴家在分享他们的曲子，有的时候很想将其稍微做一点修改，但是这样的话就要扒谱到 FlStudio 里面，即使有 MIDI 键盘辅助，手工扒谱也是一项非常吃力的活，于是就想着能不能借助 AI，这不，Google MT3 模型就出现在了我的眼前。

需要注意的是，MT3 项目并不是 Google 官方推出的，而是 Google 使用了 T5X 训练框架而已，T5X 框架是一个研究友好的框架，可以用于高性能、可配置、自助式训练、评估和序列模型（从语言开始）的推理，而 MT3 是一个多乐器自动音乐转录模型。

环境

我的配置单

运行环境：WSL2 Ubuntu 22.04（我甚至不愿意用虚拟机）
显卡：NVIDIA RTX 3080
CPU：Intel i9-11900K
内存：64G

安装模型相应的依赖

我们需要安装如 Python、TensorFlow、NumPy、Pandas 等依赖项目

$ sudo apt-get update
$ sudo apt install \
    python3 python3-pip python3-dev python3-venv \
    gcc g++ make build-essential \
    libicu-dev libbz2-dev liblzma-dev \
    libssl-dev libxml2-dev libxslt-dev ffmpeg libsndfile1-dev
$ python3 -m pip install --upgrade pip
$ pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple

接下来如果需要使用 GPU 进行加速处理的话，我们需要安装 CUBA 软件包，下面的安装方式只是一个参考，更具体的信息可以查阅 Ubuntu 官方网站

https://ubuntu.com/tutorials/enabling-gpu-acceleration-on-ubuntu-on-wsl2-with-the-nvidia-cuda-platform

$ wget https://developer.download.nvidia.com/compute/cuda/repos/wsl-ubuntu/x86_64/cuda-keyring_1.0-1_all.deb
$ sudo dpkg -i cuda-keyring_1.0-1_all.deb
$ sudo apt-get update
$ sudo apt-get -y install cuda # 有 7G 大小的依赖需要下载，注意保持网络

# 进行框架测试
$ git clone https://github.com/nvidia/cuda-samples
$ cd cuda-samples/Samples/1_Utilities/deviceQuery
$ make
$ ./deviceQuery
./deviceQuery
./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "NVIDIA GeForce RTX 3080"
  CUDA Driver Version / Runtime Version          12.1 / 12.1
  CUDA Capability Major/Minor version number:    8.6
  Total amount of global memory:                 10240 MBytes (10736893952 bytes)
  (068) Multiprocessors, (128) CUDA Cores/MP:    8704 CUDA Cores
  GPU Max Clock rate:                            1725 MHz (1.73 GHz)
  Memory Clock rate:                             9501 Mhz
  Memory Bus Width:                              320-bit
  L2 Cache Size:                                 5242880 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
  Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total shared memory per multiprocessor:        102400 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  1536
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 1 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device supports Managed Memory:                Yes
  Device supports Compute Preemption:            Yes
  Supports Cooperative Kernel Launch:            Yes
  Supports MultiDevice Co-op Kernel Launch:      No
  Device PCI Domain ID / Bus ID / location ID:   0 / 1 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 12.1, CUDA Runtime Version = 12.1, NumDevs = 1
Result = PASS

中间需要穿插安装 LLVM 指定版本，由于 LLVM 的编译对于 Clang 版本要求过于苛刻，在尝试源码编译多次无果后，我决定采用安装老版本的 APT 源进行安装

PS: 由于 MT3 严重依赖 Numba，错误的版本会导致 LLVMLite 无法安装, RuntimeError: Building llvmlite requires LLVM 10.0.x or 9.0.x, got '12.0.1'. Be sure to set LLVM_CONFIG to the right executable path

$ vim /etc/apt/source.list # 追加下面内容
deb https://mirrors.tuna.tsinghua.edu.cn/ubuntu/ focal main restricted universe multiverse
deb https://mirrors.tuna.tsinghua.edu.cn/ubuntu/ focal-updates main restricted universe multiverse
deb https://mirrors.tuna.tsinghua.edu.cn/ubuntu/ focal-backports main restricted universe multiverse
deb https://mirrors.tuna.tsinghua.edu.cn/ubuntu/ focal-security main restricted universe multiverse
$ sudo apt update
$ sudo apt install llvm-8
$ ln -s $(which llvm-config-8) /usr/bin/llvm-config
$ llvm-config --version
8.0.1

由于 Python 环境依赖的严重混乱，我们这边使用 Miniconda 创建一个新的虚拟环境进行

$ wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
$ bash Miniconda3-latest-Linux-x86_64.sh # 正常的安装流程即可，这里就不要使用 sudo 运行了

# 注销并重新登录即可看到提示符已经更换
(base) $

$ pip install --upgrade pip setuptools
$ curl https://raw.githubusercontent.com/tensorflow/magenta/master/magenta/tools/magenta-install.sh | bash - # 导入 Magenta 的环境，不推荐使用 pip 进行直接安装
$ conda activate magenta
(magenta) $ pip install google-auth google-auth-oauthlib google-auth-httplib2
$ pip install pyfluidsynth==1.3.0
$ pip install numpy pandas
$ git clone --branch=main https://github.com/magenta/mt3
$ cd mt3
$ vim setup.py # 在 install_requires 部分删除 T5x 的自动安装，不然会出现 Numba 包一直无法正确安装导致卡住
$ conda install absl-py numpy
$ pip install -e .

MT3 的首页就说明了依赖安装 T5X 框架，有了 Conda 后可以说是非常简单了，老版本的 T5x 依赖文件有 Orbox 无法安装的问题，推荐使用最新分支即可

$ git clone --branch=main https://github.com/google-research/t5x
$ cd t5x
$ pip install -e .

模型代码的编写

在编写之前，我们需要准备几样东西

MIDI 音色库：这里我们也沿用 Google 给出的案例，使用 SGM-v2.01-Sal-Guit-Bass-V1.3.sf2 文件，它是一个音色库，是 Shan 制作的通用 MIDI 音色库。它包含了许多乐器的音色，如钢琴、吉他、贝斯等等。这个音色库可以被用于音乐制作软件中，以便于制作 MIDI 音乐，SGM Soundfont
一份简单的钢琴 Wav 文件

字体选项

利用 Google MT3 模型进行乐谱识别 - 本地篇

前言

环境

我的配置单

安装模型相应的依赖

模型代码的编写