stable_video_diffusion_img2vidを拡張して生成時間を長くしてみた

はじめに
拡張したコード
ノートブック
作品集
参考サイト
- 関連

はじめに

stable_video_diffusion_img2vidで生成できる動画は25フレームですが、最終フレームを再度入力とすることで25フレーム以上の動画を生成するノートブックを共有していきます。

こちら記事もおすすめ

【リアルタイムお絵描きツール】realtime-stable-diffusion Dockerのセットアップガイド

はじめにrealtime-stable-diffusion はリアルタイムでお絵描きができる革新的なツールです。このガイドでは、Dockerを利用してこのツールをセットアップし、使用する方法をステップバイステップで説明します。こちらの記事も...

分かった気になれる！The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision)【解説/検証】（1~3章編）

はじめに「The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision)」の解説と検証を行っていきますこちらの記事もおすすめAbstract大規模なマルチモーダルモデル（LMMs）...

拡張したコード


# 必要なライブラリをインポート
import gradio as gr
import random
import subprocess
import cv2  # OpenCV, 動画や画像処理に使用

"""
# 動画生成関数
# 入力された画像から指定されたフレーム数の動画を生成します。
# 生成には複数のステップがあり、各ステップで25フレームまで生成できます。
# ユーザーが指定したシード値を使用してランダム性をコントロールします。
"""
def infer(input_path: str, resize_image: bool, n_frames: int, n_steps: int, seed: str, decoding_t: int) -> str:
    # 入力されたパラメータを表示
    print(f"Input arguments: \ninput_path={input_path}, \nresize_image={resize_image}, \nn_frames={n_frames}, \nn_steps={n_steps}, \nseed={seed}, \ndecoding_t={decoding_t}")

    # シード値の設定 (ランダムまたは指定)
    if seed == "random":
        seed = random.randint(0, 2**32)  # ランダムなシード値を生成
    seed = int(seed)

    # 一度に生成できる最大フレーム数を設定
    max_frames_per_run = 25
    output_paths = []  # 生成した動画のパスを格納するリスト

    # 生成するフレーム数に応じてループ
    for start_frame in range(0, n_frames, max_frames_per_run):
        frames_to_generate = min(max_frames_per_run, n_frames - start_frame)  # 生成するフレーム数を計算

        if start_frame > 0:
            # 前回の動画の最終フレームを次の入力として使用
            last_video_path = output_paths[-1]
            input_path = extract_last_frame(last_video_path)

        # 動画フレームの生成
        generated_paths = sample(
            input_path=input_path,
            resize_image=resize_image,
            num_frames=frames_to_generate,
            num_steps=n_steps,
            fps_id=6,
            motion_bucket_id=127,
            cond_aug=0.02,
            seed=seed,
            decoding_t=decoding_t,
            device=device,
        )
        output_paths.extend(generated_paths)  # 生成された動画のパスをリストに追加

    # FFmpegを使用して動画を結合
    combined_video_path = combine_videos_with_ffmpeg(output_paths)
    return combined_video_path

"""
# 動画の最終フレームを抽出する関数
# 前回生成した動画の最後のフレームを次の入力として使用します。
"""
def extract_last_frame(video_path):
    # OpenCVを使用して動画ファイルを読み込む
    cap = cv2.VideoCapture(video_path)

    if not cap.isOpened():
        raise Exception("Video file could not be opened: " + video_path)

    # 動画のフレーム数を取得
    frame_count = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))

    # 最終フレームに移動
    cap.set(cv2.CAP_PROP_POS_FRAMES, frame_count - 1)

    # フレームを読み込む
    ret, frame = cap.read()
    if not ret:
        raise Exception("Failed to read the last frame from: " + video_path)

    # 最終フレームを画像として保存
    last_frame_path = f"last_frame_{video_path.split('/')[-1].split('.')[0]}.png"
    cv2.imwrite(last_frame_path, frame)

    # リソースを解放
    cap.release()

    return last_frame_path

"""
# FFmpegを使用して複数の動画を結合する関数
# 個別に生成された動画ファイルを一つの動画にまとめます。
"""
def combine_videos_with_ffmpeg(video_paths):
    # 結合する動画のリストをファイルに書き出す
    with open('file_list.txt', 'w') as file:
        for path in video_paths:
            file.write(f"file '{path}'\n")

    # 結合後の動画ファイル名を設定
    combined_video_path = "combined_video.mp4"

    # FFmpegのコマンドを構築
    command = ["ffmpeg", "-y", "-f", "concat", "-safe", "0", "-i", "file_list.txt", "-c", "copy", combined_video_path]

    # FFmpegコマンドの実行
    result = subprocess.run(command, capture_output=True, text=True)

    if result.returncode != 0:
        raise Exception("FFmpeg failed")

    return combined_video_path

# -----------------------------------------
# Gradioアプリの構築
# ユーザーが動画を生成できるインターフェースを提供します。
#
with gr.Blocks() as demo:
    # UIコンポーネントの設定
    with gr.Column():
        image = gr.Image(label="input image", type="filepath")
        resize_image = gr.Checkbox(label="resize to optimal size", value=True)
        btn = gr.Button("Run")
        with gr.Accordion(label="Advanced options", open=False):
            n_frames = gr.Number(precision=0, label="number of frames", value=num_frames)
            n_steps = gr.Number(precision=0, label="number of steps", value=num_steps)
            seed = gr.Text(value="random", label="seed (integer or 'random')")
            decoding_t = gr.Number(precision=0, label="number of frames decoded at a time", value=1)
    with gr.Column():
        video_out = gr.Video(label="generated video")

    # サンプル入力の設定
    examples = [
        ["https://user-images.githubusercontent.com/33302880/284758167-367a25d8-8d7b-42d3-8391-6d82813c7b0f.png"]
    ]

    # 入力と出力の設定
    inputs = [image, resize_image, n_frames, n_steps, seed, decoding_t]
    outputs = [video_out]

    # ボタンがクリックされた時の動作を設定
    btn.click(infer, inputs=inputs, outputs=outputs)

    # サンプル入力をアプリに追加
    gr.Examples(examples=examples, inputs=inputs, outputs=outputs, fn=infer)

    # アプリの起動
    demo.queue().launch(debug=True, share=True, show_error=True)

ノートブック

Google Colab

作品集

「Stable Video Diffusion」のノートブックを拡張した！！

これで3秒以上の動画でも対応可能！！
これができるからPythonベースのツールはめちゃ助かるのよね～～

ただ、後半はやや崩れてしまう。。。https://t.co/0Y5BeYf8tL pic.twitter.com/nMd3ehbNhP

— Maki@Sunwood AI Labs. (@hAru_mAki_ch) November 23, 2023

stable_video_diffusion_img2vidの10秒のLong版で
「unlimited blade works」風の動画を作成してみた

なんか変なのが出てきたんやけど聖杯か！？！？ pic.twitter.com/f5FTDNo24m

— Maki@Sunwood AI Labs. (@hAru_mAki_ch) November 25, 2023