SAMURAI: SAMを活用したゼロショット物体追跡モデル（GoogleColab📒ノートブック付）

このノートブックでは、SAM (Segment Anything Model)を基盤とした最新の物体追跡モデル「SAMURAI」について解説します。

概要
環境セットアップ
1. 必要なPythonパッケージ
2. SAM 2のインストール
モデルのセットアップ
1. チェックポイントのダウンロード
データ準備
推論の実行
1. 基本的な推論
2. カスタム動画での実行
技術的な詳細
📒ノートブック
参考文献
1. 関連

概要

SAMURAIは、Washington大学の研究チームによって開発された新しい物体追跡モデルです。
特徴として:

SAM 2をベースとしたゼロショット学習による追跡
Motion-Aware Memoryによる効率的な物体追跡
事前学習なしでの高精度な追跡性能

SAMURAI を Google Colabで動かしてみる⑥
やはり同一個体＋オクルージョンとかだと流石にスイッチングしますね、、、人間の目でもこれは厳しい、、、
＊動画はLaSOTのデータです！ https://t.co/SbG7owwoCV pic.twitter.com/WBIMmUv5aF

— Maki@Sunwood AI Labs. (@hAru_mAki_ch) November 24, 2024

SAMURAI を Google Colabで動かしてみる⑤
トラッキングの推論できた！！！
＊動画はLaSOTのデータです！ https://t.co/KpQPw4Z3bW pic.twitter.com/iYvdjcXZD4

— Maki@Sunwood AI Labs. (@hAru_mAki_ch) November 24, 2024

環境セットアップ

必要なPythonパッケージ

# 基本的な依存関係のインストール
# !pip install torch>=2.3.1
# !pip install torchvision>=0.18.1

# SAMURAIの依存関係
!pip install matplotlib==3.7 tikzplotlib jpeg4py opencv-python lmdb pandas scipy loguru

!git clone https://github.com/yangchris11/samurai.git

SAM 2のインストール


# SAM 2のインストール
%cd /content/samurai/sam2
!pip install -e .
!pip install -e ".[notebooks]"

モデルのセットアップ

チェックポイントのダウンロード

%cd /content/samurai/sam2/checkpoints
!./download_ckpts.sh && \
%cd ..

データ準備

SAMURAIで使用するデータセットは以下のような構造で準備します:

data/LaSOT
├── airplane/
│   ├── airplane-1/
│   │   ├── full_occlusion.txt
│   │   ├── groundtruth.txt
│   │   ├── img/
│   │   ├── nlp.txt
│   │   └── out_of_view.txt
...

推論の実行

基本的な推論

%cd /content/samurai

!pwd

import os

# ディレクトリパスを設定
directory = '/content/samurai/data/LaSOT'

# ディレクトリが存在しない場合は作成
os.makedirs(directory, exist_ok=True)

# ファイルの内容
# content = '''basketball-1
# basketball-6
# basketball-7
# basketball-11'''

content = '''basketball-1
basketball-11'''

# ファイルパス
file_path = os.path.join(directory, 'testing_set.txt')

# ファイルを作成して内容を書き込む
with open(file_path, 'w') as f:
    f.write(content)

print(f'ファイルが作成されました: {file_path}')

import os
from huggingface_hub import hf_hub_download
import zipfile
import shutil

def download_and_extract_lasot(base_dir="/content/samurai/data"):
    try:
        # LaSOTディレクトリとbasketballカテゴリディレクトリのパスを作成
        lasot_dir = os.path.join(base_dir, "LaSOT")
        basketball_dir = os.path.join(lasot_dir, "basketball")
        os.makedirs(basketball_dir, exist_ok=True)

        # ZIPファイルを保存するディレクトリを作成
        zip_dir = os.path.join(base_dir, "zips")
        os.makedirs(zip_dir, exist_ok=True)

        print("Downloading dataset...")
        zip_path = hf_hub_download(
            repo_id="l-lt/LaSOT",
            filename="basketball.zip",
            repo_type="dataset",
            local_dir=zip_dir
        )
        print(f"Downloaded to: {zip_path}")

        # 直接basketballディレクトリに解凍
        print("Extracting ZIP file to basketball directory...")
        with zipfile.ZipFile(zip_path, 'r') as zip_ref:
            zip_ref.extractall(basketball_dir)

        print("\nCreated directory structure:")
        print("LaSOT/")
        print("└── basketball/")
        # 最初の数個のbasketballフォルダのみ表示
        for item in sorted(os.listdir(basketball_dir))[:5]:
            print(f"    ├── {item}/")
        print("    └── ...")

        return lasot_dir

    except Exception as e:
        print(f"An error occurred: {str(e)}")
        return None

if __name__ == "__main__":
    extract_path = download_and_extract_lasot()
    if extract_path:
        print("\nDownload and extraction completed successfully!")

# メインの推論スクリプトを実行
!python scripts/main_inference.py

カスタム動画での実行

# 動画ファイルを入力として使用する場合
# !python scripts/demo.py --video_path your_video.mp4 --txt_path first_frame_bbox.txt

# フレーム画像のディレクトリを入力として使用する場合
# !python scripts/demo.py --video_path your_frame_directory --txt_path first_frame_bbox.txt

注意: bbox.txtファイルには、最初のフレームのバウンディングボックス情報をxywh形式で1行で記述します。

技術的な詳細

SAMURAIの主な技術的特徴:

Motion-Aware Memory
- 物体の動きを考慮したメモリメカニズム
- 効率的な追跡を実現
Zero-shot Learning
- 事前学習なしでの追跡が可能
- SAM 2の強力なセグメンテーション能力を活用
Adaptive Tracking
- 動的な環境での追跡に対応
- リアルタイムでの追跡性能を実現

📒ノートブック

Google Colab

参考文献

@article{ravi2024sam2,
  title={SAM 2: Segment Anything in Images and Videos},
  author={Ravi, Nikhila and others},
  journal={arXiv preprint arXiv:2408.00714},
  year={2024}
}

@misc{yang2024samurai,
  title={SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory},
  author={Yang, Cheng-Yen and others},
  year={2024},
  eprint={2411.11922},
}

追加情報やアップデートについては、公式リポジトリを参照してください。