Style-Bert-VITS2で生成した音声をVTube StudioとPython連携して表情を制御してみた

最新技術を駆使して、リアルタイムでキャラクターの表情をコントロールすることは、多くのバーチャルYouTuberやストリーマーにとって非常に興味深いトピックです。この記事では、Style-Bert-VITS2で生成した音声を使い、VTube StudioとPythonを連携させて表情を制御する方法を紹介します。

VTube StudioとPythonの連携についてはこちら

VTube StudioとPython連携してモーション発動させてみた

バーチャルYouTuber（VTuber）の世界では、配信や動画の中でキャラクターを動かすために様々な技術が用いられています。その中でも、VTube Studioはリアルタイムでのアバター制御に特化したアプリケーションとして広く利用されてい...

Style-Bert-VITS2についてはこちら

DockerでStyle-Bert-VITS2 APIを簡単構築し音声合成を体験しよう【ワンパン構築】

Dockerを使って、AI音声合成のプロジェクト「Style-Bert-VITS2」のAPIを簡単に構築し、実際に動かしてみましょう。この記事では、初心者の方でも理解できるように、コードブロックを交えて解説します。PythonとDocker...

デモ動画
事前準備
デモコードの実行
まとめ
リポジトリ
1. 関連

デモ動画

記事書きました！！
---
Style-Bert-VITS2で生成した音声をVTube StudioとPython連携して表情を制御してみた

🔗https://t.co/SvCg4e2Bfc pic.twitter.com/2dJtMGfZF6

— Maki@Sunwood AI Labs. (@hAru_mAki_ch) February 26, 2024

事前準備

モデルの準備

まず、使用するモデルを選び、表情データが利用可能であることを確認します。ここでは、例として「Akari」モデルを使用することにします。

キーバインドの設定

次に、VTube Studioでモデルのキーバインドを設定します。デフォルト設定がない場合は、リセットして表情に関するキーバインドを確認しましょう。

仮想カメラの設定

仮想カメラをインストールし、「VTubeStudioCam」を追加することで、OBS Studioなどの配信ソフトウェアから映像をキャプチャできるようにします。

下記のヘルプからインストールできます。

デモコードの実行

以下のデモコードは、Pythonを使ってVTube Studio APIに接続し、音声ファイルの再生に合わせてランダムにキャラクターの表情を変更します。


import asyncio
import json
import os
import re
import random
import websockets
from pygame import mixer
import sys

sys.path.append(os.path.join(os.path.dirname(__file__), '..'))
sys.path.append(os.path.join(os.path.dirname(__file__), '../api'))

from api.authentication import request_authentication_token, authenticate_plugin

async def get_hotkeys(websocket):
    request = {
        "apiName": "VTubeStudioPublicAPI",
        "apiVersion": "1.0",
        "requestID": "UniqueRequestIDForHotkeys",
        "messageType": "HotkeysInCurrentModelRequest",
        "data": {}
    }
    await websocket.send(json.dumps(request))
    response = await websocket.recv()
    response_json = json.loads(response)
    if "data" in response_json and "availableHotkeys" in response_json["data"]:
        return response_json["data"]["availableHotkeys"]
    return []

async def trigger_random_hotkey(websocket, hotkeys):
    if hotkeys:
        hotkey = random.choice(hotkeys)
        hotkey_id = hotkey.get("hotkeyID")
        if hotkey_id:
            request = {
                "apiName": "VTubeStudioPublicAPI",
                "apiVersion": "1.0",
                "requestID": "UniqueRequestIDForTriggering",
                "messageType": "HotkeyTriggerRequest",
                "data": {
                    "hotkeyID": hotkey_id
                }
            }
            await websocket.send(json.dumps(request))
            response = await websocket.recv()
            print(f"Triggered Hotkey Response: {response}")

# オーディオファイルを順に再生し、再生後にランダムなホットキーをトリガーする関数
async def play_audio_and_trigger_hotkeys(websocket, folder_path='audio/Word2Motion'):
    files = [file for file in os.listdir(folder_path) if file.endswith('.wav')]
    sorted_files = sorted(files, key=lambda file: int(re.search(r'\d+', file).group()) if re.search(r'\d+', file) else 0)

    mixer.init()  # pygameのmixerモジュールを初期化
    for file in sorted_files:
        file_path = os.path.join(folder_path, file)
        print(f"再生中: {file}")
        mixer.music.load(file_path)
        mixer.music.play()
        while mixer.music.get_busy():  # ファイルの再生が完了するのを待つ
            await asyncio.sleep(1)
        await trigger_random_hotkey(websocket, await get_hotkeys(websocket))  # 再生後にランダムなホットキーをトリガー

async def main():
    uri = "ws://localhost:8001"
    async with websockets.connect(uri) as websocket:
        plugin_name = "My Cool Plugin"
        plugin_developer = "My Name"
        authentication_token = await request_authentication_token(websocket, plugin_name, plugin_developer)
        if authentication_token:
            print(f"Token: {authentication_token}")
            is_authenticated = await authenticate_plugin(websocket, plugin_name, plugin_developer, authentication_token)
            print(f"Authenticated: {is_authenticated}")
            if is_authenticated:
                await play_audio_and_trigger_hotkeys(websocket)  # オーディオ再生とホットキーのトリガー

asyncio.run(main())

このコードは、非同期通信を利用してVTube StudioのAPIに接続し、特定の操作を自動化する基本的なフレームワークを提供します。実際に表情を変更する部分は、このフレームワークに追加する必要があります。