無料Google Colabでできる📒 Wan2.1-T2I を使った高品質キャラクター画像生成ガイド

AI動画生成

2025.08.23

🚀 はじめに
- ✨ 使用技術について
- 🔧 ComfyUIのPython直接実行について
📦 環境構築とモデルのダウンロード
⚙️ モデルローダーの初期化
🖼️ 画像生成関数の定義
- 高品質キャラクター生成関数
🎨 ハリウッド級妖怪キャラクター生成実例
🎯 プロンプト作成のコツと最適化
🏆 まとめ
📒ノートブック
- 🎯 重要ポイント
- 関連

🚀 はじめに

このノートブックでは、Wan2.1-T2V-14BモデルとComfyUIを使用して、ハリウッド映画級の美しい妖怪キャラクターを生成する方法を解説します。

Wan2.1_T2I_jupyter が無料のGoogle Colabで無事に動いた3⃣
日本語化して記事風にしてみた！！
＊後ほど共有します https://t.co/RM4EvRLKln pic.twitter.com/t9OeocHQ5g

— Maki@Sunwood AI Labs. (@hAru_mAki_ch) August 23, 2025

✨ 使用技術について

Wan2.1: Alibaba社が開発したオープンソースの最先端ビデオ生成モデル。Text-to-Video、Image-to-Video、Video EditingなどマルチタスクOKで、中英両言語テキスト生成に対応した革新的なモデル
GGUF形式: 量子化により低VRAMでの効率的な実行を可能にする軽量モデル形式。特にTransformer/DiTモデルに最適化され、重み変数あたりのビット数を削減して効率的な性能を実現
ComfyUI: ノードベースのワークフロー作成が可能な、AI画像・動画生成のための直感的インターフェース。グラフ/ノード/フローチャート形式でコードを書かずに複雑なワークフローを実験・作成できる
FusionX LoRA: CausVid、AccVideo、MoviiGen、MPS Reward LoRAなどの研究グレードモデルを融合したカスタムLoRA。8ステップで高品質動画生成を実現し、テクスチャ、明瞭度、細部ディテールに特化

🔧 ComfyUIのPython直接実行について

このノートブックはComfyUIをPythonで直接実行する手法を使用しています：

通常のComfyUI（GUI版）:

ブラウザでノードをドラッグ&ドロップして視覚的にワークフローを構築
リアルタイムでパラメータ調整、キューシステムで実行管理

今回のPython直接実行:

ComfyUIのノードクラス（NODE_CLASS_MAPPINGS）をPythonで直接呼び出し
GUIなしでComfyUIの内部機能をプログラマティックに制御
UnetLoaderGGUF().load_unet()のようにメソッドを直接実行

このアプローチの利点:

Google Colabでの完全自動化実行
複数キャラクター生成のバッチ処理が容易
プログラマブルな制御（ループ、条件分岐、エラーハンドリング）
ヘッドレス環境での安定実行

📦 環境構築とモデルのダウンロード

1. 基本環境のセットアップ

%cd /content
!git clone https://github.com/comfyanonymous/ComfyUI /content/ComfyUI
!git clone https://github.com/city96/ComfyUI-GGUF /content/ComfyUI/custom_nodes/ComfyUI-GGUF
!pip install torchsde gguf

2. Aria2を使った高速ダウンロードの準備

!apt install aria2 -qqy

3. 必要なモデルファイルのダウンロード

# Wan2.1 UNetモデル (GGUF量子化版)
# 14Bパラメータの大規模DiT(Diffusion Transformer)モデル
# Q3_K_Sは3bit量子化でメモリ効率とモデル品質のバランスを実現
!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/city96/Wan2.1-T2V-14B-gguf/resolve/main/wan2.1-t2v-14b-Q3_K_S.gguf -d /content/ComfyUI/models/unet -o wan2.1-t2v-14b-Q3_K_M.gguf

# UMT5-XXL テキストエンコーダー (GGUF量子化版)
# Googleの多言語T5モデル（107言語対応、5.68Bパラメータ）のエンコーダ部分
# Wan2.1でのプロンプト処理に特化した軽量版
!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/city96/umt5-xxl-encoder-gguf/resolve/main/umt5-xxl-encoder-Q3_K_S.gguf -d /content/ComfyUI/models/clip -o umt5-xxl-encoder-Q3_K_M.gguf

# Wan2.1 VAE (Variational AutoEncoder)
# 高効率動画エンコード/デコード専用の3D因果VAE
# 1080P動画を16×16×4圧縮比でテンポラル情報を保持して処理
!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/resolve/main/split_files/vae/wan_2.1_vae.safetensors -d /content/ComfyUI/models/vae -o/wan_2.1_vae.safetensors

# FusionX LoRAモデル (317MB)
# CausVid + AccVideo + MoviiGen + MPS Reward LoRAの融合型拡張モデル
# 8ステップで高品質生成、テクスチャとモーション品質を大幅向上
!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/vrgamedevgirl84/Wan14BT2VFusioniX/resolve/main/FusionX_LoRa/Wan2.1_T2V_14B_FusionX_LoRA.safetensors -d /content/ComfyUI/models/loras/FusionX -o Wan2.1_T2V_14B_FusionX_LoRA.safetensors

4. 追加の依存関係インストール

!pip install av

⚙️ モデルローダーの初期化

1. ComfyUI環境の準備

%cd /content/ComfyUI

import torch
import random, time
from PIL import Image
import numpy as np

from nodes import NODE_CLASS_MAPPINGS
from comfy_extras import nodes_hunyuan, nodes_model_advanced

2. ComfyUI-GGUFカスタムノードの手動読み込み

# ComfyUI-GGUFの手動読み込み
import sys
import os
sys.path.append("/content/ComfyUI/custom_nodes/ComfyUI-GGUF")

try:
    # ComfyUI-GGUFのモジュールを手動で読み込み
    import importlib.util
    spec = importlib.util.spec_from_file_location("comfyui_gguf", "/content/ComfyUI/custom_nodes/ComfyUI-GGUF/__init__.py")
    comfyui_gguf = importlib.util.module_from_spec(spec)
    sys.modules["comfyui_gguf"] = comfyui_gguf
    spec.loader.exec_module(comfyui_gguf)

    # NODE_CLASS_MAPPINGSを更新
    if hasattr(comfyui_gguf, 'NODE_CLASS_MAPPINGS'):
        NODE_CLASS_MAPPINGS.update(comfyui_gguf.NODE_CLASS_MAPPINGS)
        print("ComfyUI-GGUF nodes loaded successfully!")

except Exception as e:
    print(f"Error loading ComfyUI-GGUF: {e}")

3. 必要なノードの確認と初期化

# 利用可能なGGUFノードを確認
gguf_nodes = [k for k in NODE_CLASS_MAPPINGS.keys() if "GGUF" in k]
print(f"Available GGUF nodes: {gguf_nodes}")

# 必要なノードが利用可能か確認
required_nodes = ["UnetLoaderGGUF", "CLIPLoaderGGUF", "LoraLoaderModelOnly", "VAELoader"]
missing_nodes = [node for node in required_nodes if node not in NODE_CLASS_MAPPINGS]

if missing_nodes:
    print(f"Missing nodes: {missing_nodes}")
    print("Available nodes:", list(NODE_CLASS_MAPPINGS.keys())[:20])  # 最初の20個を表示
else:
    # 全てのノードが利用可能な場合
    UnetLoaderGGUF = NODE_CLASS_MAPPINGS["UnetLoaderGGUF"]()
    CLIPLoaderGGUF = NODE_CLASS_MAPPINGS["CLIPLoaderGGUF"]()
    LoraLoaderModelOnly = NODE_CLASS_MAPPINGS["LoraLoaderModelOnly"]()
    VAELoader = NODE_CLASS_MAPPINGS["VAELoader"]()

    CLIPTextEncode = NODE_CLASS_MAPPINGS["CLIPTextEncode"]()
    EmptyHunyuanLatentVideo = nodes_hunyuan.NODE_CLASS_MAPPINGS["EmptyHunyuanLatentVideo"]()

    KSampler = NODE_CLASS_MAPPINGS["KSampler"]()
    ModelSamplingSD3 = nodes_model_advanced.NODE_CLASS_MAPPINGS["ModelSamplingSD3"]()
    VAEDecode = NODE_CLASS_MAPPINGS["VAEDecode"]()

    print("All required nodes loaded successfully!")

4. モデルファイル存在確認と一括ロード

    # モデルファイルが正しい場所にあるか確認
    model_paths = {
        "unet": "/content/ComfyUI/models/unet/wan2.1-t2v-14b-Q3_K_M.gguf",
        "clip": "/content/ComfyUI/models/clip/umt5-xxl-encoder-Q3_K_M.gguf", 
        "vae": "/content/ComfyUI/models/vae/wan_2.1_vae.safetensors",
        "lora": "/content/ComfyUI/models/loras/FusionX/Wan2.1_T2V_14B_FusionX_LoRA.safetensors"
    }

    print("\nModel file check:")
    for model_type, path in model_paths.items():
        exists = os.path.exists(path)
        print(f"{model_type}: {'✓' if exists else '✗'} {path}")

    # モデル読み込みを試行
    try:
        with torch.inference_mode():
            print("\nLoading models...")

            # UNetモデル (GGUF) - メイン生成エンジン
            unet = UnetLoaderGGUF.load_unet("wan2.1-t2v-14b-Q3_K_M.gguf")[0]
            print("✓ UNet loaded")

            # CLIPテキストエンコーダー (GGUF) - プロンプト理解
            clip = CLIPLoaderGGUF.load_clip("umt5-xxl-encoder-Q3_K_M.gguf", "wan")[0]
            print("✓ CLIP loaded")

            # LoRA適用 - FusionX拡張機能
            lora = LoraLoaderModelOnly.load_lora_model_only(unet, "FusionX/Wan2.1_T2V_14B_FusionX_LoRA.safetensors", 1.0)[0]
            print("✓ LoRA loaded")

            # VAEモデル - 画像エンコード/デコード
            vae = VAELoader.load_vae("wan_2.1_vae.safetensors")[0]
            print("✓ VAE loaded")

            print("\n&#x1f389; All models loaded successfully!")

    except Exception as e:
        print(f"\n&#x274c; Error loading models: {e}")

🖼️ 画像生成関数の定義

高品質キャラクター生成関数

def generate_image(positive_prompt, negative_prompt,
                  seed=0, steps=10, cfg=1.0,
                  sampler_name="euler", scheduler="beta",
                  width=1280, height=720, length=1,
                  output_path="/content/test.png"):
    """
    高品質キャラクター画像生成関数

    Parameters:
    -----------
    positive_prompt : str
        ポジティブプロンプト（求める内容の詳細な描写）
    negative_prompt : str  
        ネガティブプロンプト（避けたい要素の指定）
    seed : int
        乱数シード（0でランダム生成）
    steps : int
        サンプリングステップ数（品質に影響）
    cfg : float
        CFG値（プロンプト順守度）
    sampler_name : str
        サンプラー名
    scheduler : str
        スケジューラー名
    width, height : int
        出力画像サイズ
    length : int
        動画用パラメータ（静止画では1）
    output_path : str
        保存先パス

    Returns:
    --------
    PIL.Image
        生成された画像オブジェクト
    """

    with torch.inference_mode():
        # 1. プロンプトのテキストエンコーディング
        positive_encoded = CLIPTextEncode.encode(clip, positive_prompt)[0]
        negative_encoded = CLIPTextEncode.encode(clip, negative_prompt)[0]

        # 2. モデル設定の適用
        model_configured = ModelSamplingSD3.patch(lora, 1.0)[0]
        latent_space = EmptyHunyuanLatentVideo.generate(width, height, length)[0]

        # 3. シード値の設定
        if seed == 0:
            random.seed(int(time.time()))
            seed = random.randint(0, 18446744073709551615)

        print(f"&#x1f3b2; 使用シード値: {seed}")

        # 4. KSamplerによる画像生成
        samples = KSampler.sample(
            model_configured, seed, steps, cfg, 
            sampler_name, scheduler, 
            positive_encoded, negative_encoded, latent_space
        )[0]

        # 5. VAEデコードと画像変換
        decoded = VAEDecode.decode(vae, samples)[0].detach()
        image_array = np.array(decoded * 255, dtype=np.uint8)[0]
        final_image = Image.fromarray(image_array)

        # 6. 画像保存
        if output_path:
            final_image.save(output_path)
            print(f"&#x1f4be; 画像を保存しました: {output_path}")

        return final_image

🎨 ハリウッド級妖怪キャラクター生成実例

1. 雪女 (Yuki-onna) - 氷の美しさを纏った女性

# 雪女の詳細なプロンプト設計
positive_yuki_onna = """
A breathtakingly beautiful female yuki-onna (snow woman) with Hollywood-level striking features, 
sharp cheekbones, deep-set piercing blue eyes, porcelain white skin with subtle icy undertones, 
long flowing platinum white hair, elegant kimono made of frost and ice crystals, 
standing in a snow-covered bamboo forest, ethereal ice particles floating around her, 
cinematic lighting with moonlight filtering through snow, highly detailed, fantasy art style, 
sophisticated beauty, otherworldly elegance, model-like facial structure
"""

negative_yuki_onna = """
ugly, deformed, malformed, grotesque, scary, frightening, hideous, monstrous, evil expression, 
crude, rough, low quality, blurry, pixelated, amateur, bad anatomy, extra limbs, missing limbs, 
distorted face, asymmetrical features, bad lighting, oversaturated colors, cartoon style, childish, 
masculine features, flat face, small eyes, warm colors, summer setting
"""

# 雪女画像の生成実行
yuki_onna_image = generate_image(
    positive_yuki_onna, 
    negative_yuki_onna, 
    output_path="/content/yuki_onna.png"
)

# 生成結果の表示
display(yuki_onna_image)

2. 天狗 (Tengu) - 威厳ある山の守護者

positive_tengu = """
A devastatingly handsome male tengu with Hollywood actor-level masculine features, 
chiseled jawline, deep-set intense eyes, strong nose bridge, long flowing dark hair, 
elegant red and black traditional robes, magnificent feathered wings, 
holding an ornate feather fan, perched on a mountain peak with ancient pine trees, 
dramatic clouds and mist, cinematic lighting with golden hour glow, highly detailed, 
fantasy art style, masculine beauty, regal presence, sharp facial features
"""

negative_tengu = """
ugly, deformed, malformed, grotesque, scary, frightening, hideous, monstrous, evil expression, 
overly muscular, crude, rough, low quality, blurry, pixelated, amateur, bad anatomy, 
extra limbs, missing limbs, distorted face, asymmetrical features, bad lighting, 
oversaturated colors, cartoon style, childish, feminine features, round face, small features, 
modern clothing
"""

tengu_image = generate_image(
    positive_tengu, 
    negative_tengu, 
    output_path="/content/tengu.png"
)

display(tengu_image)

3. 九尾の狐 (Kitsune) - 神秘的な狐の精霊

positive_kitsune = """
An enchanting female nine-tailed fox spirit with Hollywood starlet beauty, 
sharp elegant features, high cheekbones, mesmerizing golden eyes with fox-like pupils, 
long silky hair with auburn highlights, wearing flowing silk robes in gold and crimson, 
nine magnificent fluffy tails swirling around her, standing in an ancient shrine with torii gates, 
fox fire (kitsunebi) dancing around her, cinematic lighting with warm golden tones, 
highly detailed, fantasy art style, seductive beauty, mystical charm, refined facial structure
"""

negative_kitsune = """
ugly, deformed, malformed, grotesque, scary, frightening, hideous, monstrous, evil expression, 
crude, rough, low quality, blurry, pixelated, amateur, bad anatomy, extra limbs, missing limbs, 
distorted face, asymmetrical features, bad lighting, oversaturated colors, cartoon style, childish, 
masculine features, dog-like features, less than nine tails, modern setting, cold colors
"""

kitsune_image = generate_image(
    positive_kitsune, 
    negative_kitsune, 
    output_path="/content/kitsune.png"
)

display(kitsune_image)

4. 龍神 (Dragon God) - 威厳ある龍の化身

positive_dragon_god = """
A magnificently handsome male dragon god in human form with Hollywood leading man features, 
strong angular face, penetrating dragon eyes with golden irises, noble expression, 
long black hair with blue highlights, wearing elaborate golden and blue robes with dragon motifs, 
dragon scales subtly visible on his arms, standing in a celestial palace with clouds and lightning, 
majestic horns protruding elegantly from his forehead, cinematic lighting with divine rays, 
highly detailed, fantasy art style, godlike beauty, imperial presence, sharp aristocratic features
"""

negative_dragon_god = """
ugly, deformed, malformed, grotesque, scary, frightening, hideous, monstrous, evil expression, 
overly muscular, crude, rough, low quality, blurry, pixelated, amateur, bad anatomy, 
extra limbs, missing limbs, distorted face, asymmetrical features, bad lighting, 
oversaturated colors, cartoon style, childish, feminine features, western dragon features, 
full dragon form, dark evil atmosphere
"""

dragon_god_image = generate_image(
    positive_dragon_god, 
    negative_dragon_god, 
    output_path="/content/dragon_god.png"
)

display(dragon_god_image)

🎯 プロンプト作成のコツと最適化

✨ 高品質プロンプトの構成要素

キャラクターベース: beautiful female/handsome male + 種族名
顔の特徴: sharp cheekbones, deep-set eyes, strong jawline
品質指示: Hollywood-level, cinematic, highly detailed
スタイル指定: fantasy art style, model-like features
環境設定: atmospheric background, dramatic lighting

⚠️ ネガティブプロンプトの重要性

# 推奨されるベースネガティブプロンプト
base_negative = """
ugly, deformed, malformed, grotesque, low quality, blurry, pixelated, 
bad anatomy, distorted face, asymmetrical features, bad lighting, 
cartoon style, amateur, crude, rough
"""

📊 各モデルファイルの詳細情報

モデル	サイズ	役割	特徴
wan2.1-t2v-14b-Q3_K_M.gguf	~6GB	メインDiT生成モデル	14Bパラメータ、3bit量子化、高品質と効率のバランス
umt5-xxl-encoder-Q3_K_M.gguf	~3GB	テキストエンコーダー	107言語対応、5.68Bパラメータ、プロンプト理解の要
wan_2.1_vae.safetensors	254MB	画像エンコーダー/デコーダー	3D因果VAE、16×16×4圧縮、テンポラル情報保持
FusionX_LoRA.safetensors	317MB	拡張モデル	CausVid+AccVideo+MoviiGen融合、8ステップ高速生成

🏆 まとめ

このノートブックでは、Wan2.1-T2V-14BとComfyUIを活用して、ハリウッド級の高品質妖怪キャラクターを生成する完全なワークフローを紹介しました。

📒ノートブック

Google Colab

🎯 重要ポイント

GGUF形式により低VRAMでの高品質生成を実現
詳細なプロンプト設計が品質向上の鍵
ネガティブプロンプトによる不要要素の除去が重要
パラメータ調整によるハードウェア最適化

**🌟 Happy Creating! このツールを使って、あなた独自の美しい妖怪キャラクターを生み出してください！