Python初心者でも分かる！行動認識のためのLabelStudioを使った動画トラッキングアノテーションの可視化プログラム

PythonとLabelStudioの概要
動画トラッキングアノテーション
行動認識の基本
可視化プログラム
可視化プログラムの解説
実用性と応用事例
まとめ
FAQ

PythonとLabelStudioの概要

Pythonの基本

Pythonは、プログラミング言語の一種で、初心者でも習得しやすい特徴があります。そのため、多くの分野で使われています。この記事では、Pythonを使って動画トラッキングアノテーションの可視化プログラムを作成します。

LabelStudioの基本

LabelStudioは、データアノテーションツールで、オープンソースで開発されています。画像や動画、テキストなどさまざまなデータに対して、アノテーションを追加することができます。この記事では、LabelStudioを使って動画トラッキングアノテーションを行います。

【Docker-compose】Label Studioとは：機械学習ラベリングのための開発プラットフォーム

はじめに機械学習を行うにあたり，ラベリングとは非常に重要なプロセスです．ラベリングとは，データセットを人間が理解できる形式に変換することで，機械学習アルゴリズムが正しく学習するために必要なアノテーション付きデータを提供することです． Lab...

データアノテーションの意義

データアノテーションは、機械学習やAI技術を活用する上で重要な役割を果たしています。アノテーションされたデータをもとに、機械が学習し、より高度な判断や分析が可能になります。

動画トラッキングアノテーション

動画トラッキングの仕組み

動画トラッキングは、動画の中の物体や人物を追跡し、その位置情報を取得する技術です。これにより、物体や人物の動きを分析することができます。

アノテーションツール

アノテーションツールは、動画や画像に情報を付加するためのソフトウェアです。これにより、データを機械学習に適した形式に変換できます。

LabelStudioを使った動画アノテーション

LabelStudioを使って、動画のトラッキングアノテーションを行う方法を解説します。まず、LabelStudioのウェブサイトからダウンロードし、インストールします。次に、プロジェクトを作成し、動画ファイルをアップロードします。その後、アノテーションタスクを作成し、トラッキング対象のオブジェクトを指定します。最後に、アノテーションデータを保存します。

行動認識の基本

行動認識の重要性

行動認識は、動画や画像から人間や動物の行動を識別する技術です。この技術は、スポーツ分析や広告業界、製造業など多くの分野で活用されています。

行動認識の応用事例

行動認識技術は、さまざまな分野で活用されています。例えば、スポーツ分析では選手の動きを分析し、パフォーマンス向上に役立てます。また、広告業界では、消費者の行動や反応を分析し、効果的な広告戦略を立てるために活用されています。

～Ika-Action～プレイ動画行動解析【スプラトゥーン3】【Python】

更新履歴2023.04.12 「プレイ動画のデータ収集」追記2023.04.13 「データセットのフォーマット」，「データセットのアノテート」追記2023.04.17 「AVAデータセットに変換」追記はじめにスプラトゥーン3は，任天堂から発...

行動認識技術の進化

近年、行動認識技術は急速に進化しています。ディープラーニングや機械学習の発展により、より精度の高い行動認識が可能になりました。また、エッジデバイス上でのリアルタイム行動認識も現実的になってきており、さまざまな分野での活用が期待されています。

可視化プログラム


import cv2
import json
import os
from loguru import logger

class LSVideoAnnotator:
    def __init__(self, json_file, video_file, img_w=1920, img_h=1080):
        self.json_file = json_file
        self.video_file = video_file
        self.img_w = img_w
        self.img_h = img_h

        # JSONファイルを読み込む
        with open(self.json_file, "r", encoding="utf-8") as file:
            self.data = json.load(file)

        # 動画ファイルを読み込む
        self.cap = cv2.VideoCapture(self.video_file)

        # アノテーション情報を取得
        self.annotations = self.data[0]["annotations"][0]["result"]

        # アノテーション情報をフレームごとに整理
        self.frame_annotations = self._organize_frame_annotations()

    def _organize_frame_annotations(self):
        # 各フレームに含まれるアノテーション情報を整理するための辞書を初期化
        frame_annotations = {}
        for annotation in self.annotations:
            label = annotation["value"]["labels"][0]
            for sequence in annotation["value"]["sequence"]:
                frame = sequence["frame"]
                if frame not in frame_annotations:
                    frame_annotations[frame] = []
                sequence["label"] = label
                frame_annotations[frame].append(sequence)
        return frame_annotations

    @staticmethod
    def interpolate_annotation(annotation_prev, annotation_next, current_frame):
        # アノテートされていないフレームについて、前後のアノテーション情報から補完する
        ratio = (current_frame - annotation_prev["frame"]) / (annotation_next["frame"] - annotation_prev["frame"])
        x = int(annotation_prev["x"] + ratio * (annotation_next["x"] - annotation_prev["x"]))
        y = int(annotation_prev["y"] + ratio * (annotation_next["y"] - annotation_prev["y"]))
        w = int(annotation_prev["width"] + ratio * (annotation_next["width"] - annotation_prev["width"]))
        h = int(annotation_prev["height"] + ratio * (annotation_next["height"] - annotation_prev["height"]))

        return {"x": x, "y": y, "width": w, "height": h, "label": annotation_prev["label"], "enabled": True}

    def save_annotated_frames(self, output_dir="annotated_frames3"):
        # 出力ディレクトリが存在しない場合、作成する
        if not os.path.exists(output_dir):
            os.makedirs(output_dir)

        logger.info("アノテーション付きのフレームを保存しています...")
        while self.cap.isOpened():
            ret, frame = self.cap.read()
            if not ret:
                break

            current_frame = int(self.cap.get(cv2.CAP_PROP_POS_FRAMES))
            self._draw_annotation_on_frame(frame, current_frame)

            output_path = os.path.join(output_dir, f"frame_{current_frame:04d}.png")
            cv2.imwrite(output_path, frame)

        logger.info("アノテーション付きのフレームの保存が完了しました。")
        self.cap.release()

    def _draw_annotation_on_frame(self, frame, current_frame):
        # 各アノテーション情報に基づいて、フレーム上にアノテーションを描画する
        for annotation in self.annotations:
            current_annotation = None
            prev_annotation = None
            next_annotation = None

            for sequence in annotation["value"]["sequence"]:
                if sequence["frame"] == current_frame:
                    current_annotation = sequence
                    break
                elif sequence["frame"] < current_frame:
                    prev_annotation = sequence
                elif sequence["frame"] > current_frame:
                    next_annotation = sequence
                    break

            if current_annotation is None and prev_annotation is not None and next_annotation is not None:
                current_annotation = self.interpolate_annotation(prev_annotation, next_annotation, current_frame)

            if current_annotation is not None and current_annotation["enabled"]:
                x, y, w, h = current_annotation["x"], current_annotation["y"], current_annotation["width"], current_annotation["height"]

                x = int(self.img_w * x / 100)
                y = int(self.img_h * y / 100)
                w = int(self.img_w * w / 100)
                h = int(self.img_h * h / 100)

                color = (0, 255, 0)
                label = current_annotation["label"]

                # アノテーションを描画する
                cv2.rectangle(frame, (x, y), (x + w, y + h), color, 2)
                cv2.putText(frame, label, (x, y - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, color, 2)

if __name__ == "__main__":
    json_file = "/content/drive/MyDrive/Ika-Action/datasets/project-1-at-2023-05-04-16-11-d6f787bd.json"
    video_file = "/content/drive/MyDrive/Ika-Action/datasets/12ac291c-52.mp4"

    # インスタンスを作成し、アノテーション付きのフレームを保存する
    video_annotator = LSVideoAnnotator(json_file=json_file, video_file=video_file)
    video_annotator.save_annotated_frames()

可視化プログラムの解説

必要なライブラリ


import cv2
import json
import os
from loguru import logger

ここでは、必要なライブラリをインポートしています。

cv2: OpenCVは、画像処理や動画処理に使われるライブラリです。
json: JSONファイルの読み書きを行うためのライブラリです。
os: ファイルやディレクトリ操作を行うためのライブラリです。
logger: loguruは、ログ出力を簡単に行うためのライブラリです。

LSVideoAnnotatorクラスの定義

次に、LSVideoAnnotatorクラスが定義されています。


class LSVideoAnnotator:
    def __init__(self, json_file, video_file, img_w=1920, img_h=1080):
        self.json_file = json_file
        self.video_file = video_file
        self.img_w = img_w
        self.img_h = img_h

このクラスの__init__メソッドは、以下の引数を受け取ります。

json_file: アノテーション情報が記述されたJSONファイルのパス
video_file: アノテーションが付けられた動画ファイルのパス
img_w: 動画の横幅（デフォルトは1920）
img_h: 動画の縦幅（デフォルトは1080）

次に、JSONファイルと動画ファイルを読み込んでいます。


with open(self.json_file, "r", encoding="utf-8") as file:
            self.data = json.load(file)

        self.cap = cv2.VideoCapture(self.video_file)

with文を使って、self.json_fileを開いてjson.load()でJSONデータを読み込んでいます。また、cv2.VideoCapture()を使って動画ファイルを読み込んでいます。

その後、アノテーション情報を取得し、フレームごとに整理しています。


self.annotations = self.data[0]["annotations"][0]["result"]
        self.frame_annotations = self._organize_frame_annotations()

_organize_frame_annotationsメソッド

_organize_frame_annotationsメソッドは、アノテーション情報をフレームごとに整理するためのメソッドです。


def _organize_frame_annotations(self):
        frame_annotations = {}
        for annotation in self.annotations:
            label = annotation["value"]["labels"][0]
            for sequence in annotation["value"]["sequence"]:
                frame = sequence["frame"]
                if frame not in frame_annotations:
                    frame_annotations[frame] = []
                sequence["label"] = label
                frame_annotations[frame].append(sequence)
        return frame_annotations

ここでは、self.annotationsの各要素に対して、ラベルを取得し、シーケンスごとにフレーム番号をキーとして、アノテーション情報をframe_annotationsという辞書に格納しています。このメソッドは、フレームごとにアノテーション情報を整理した辞書を返します。

interpolate_annotationメソッド

次に、interpolate_annotationという静的メソッドが定義されています。


@staticmethod
    def interpolate_annotation(annotation_prev, annotation_next, current_frame):
        ratio = (current_frame - annotation_prev["frame"]) / (annotation_next["frame"] - annotation_prev["frame"])
        x = int(annotation_prev["x"] + ratio * (annotation_next["x"] - annotation_prev["x"]))
        y = int(annotation_prev["y"] + ratio * (annotation_next["y"] - annotation_prev["y"]))
        w = int(annotation_prev["width"] + ratio * (annotation_next["width"] - annotation_prev["width"]))
        h = int(annotation_prev["height"] + ratio * (annotation_next["height"] - annotation_prev["height"]))

        return {"x": x, "y": y, "width": w, "height": h, "label": annotation_prev["label"], "enabled": True}

このメソッドは、アノテーションされていないフレームについて、前後のアノテーション情報から補完するためのものです。前後のアノテーション情報と現在のフレーム番号を引数にとり、線形補間を行って新しいアノテーション情報を返します。

save_annotated_frames

save_annotated_framesメソッドでは、アノテーション付きのフレームを保存します。


def save_annotated_frames(self, output_dir="annotated_frames3"):
        if not os.path.exists(output_dir):
            os.makedirs(output_dir)

        logger.info("アノテーション付きのフレームを保存しています...")
        while self.cap.isOpened():
            ret, frame = self.cap.read()
            if not ret:
                break

            current_frame = int(self.cap.get(cv2.CAP_PROP_POS_FRAMES))
            self._draw_annotation_on_frame(frame, current_frame)

            output_path = os.path.join(output_dir, f"frame_{current_frame:04d}.png")
            cv2.imwrite(output_path, frame)

        logger.info("アノテーション付きのフレームの保存が完了しました。")
        self.cap.release()

このメソッドでは、出力ディレクトリが存在しない場合にディレクトリを作成し、動画ファイルを読み込んでアノテーションを描画し、アノテーション付きのフレームを保存しています。また、動画の読み込みが終わったら、キャプチャを解放しています。

_draw_annotation_on_frame

_draw_annotation_on_frameメソッドでは、各アノテーション情報に基づいて、フレーム上にアノテーションを描画しています。


def _draw_annotation_on_frame(self, frame, current_frame):
        for annotation in self.annotations:
            current_annotation = None
            prev_annotation = None
            next_annotation = None

            for sequence in annotation["value"]["sequence"]:
                if sequence["frame"] == current_frame:
                    current_annotation = sequence
                    break
                elif sequence["frame"] < current_frame:
                    prev_annotation = sequence
                elif sequence["frame"] > current_frame:
                    next_annotation = sequence
                    break

            if current_annotation is None and prev_annotation is not None and next_annotation is not None:
                current_annotation = self.interpolate_annotation(prev_annotation, next_annotation, current_frame)

            if current_annotation is not None and current_annotation["enabled"]:
                x, y, w, h = current_annotation["x"], current_annotation["y"], current_annotation["width"], current_annotation["height"]

                x = int(self.img_w * x / 100)
                y = int(self.img_h * y / 100)
                w = int(self.img_w * w / 100)
                h = int(self.img_h * h / 100)

                color = (0, 255, 0)
                label = current_annotation["label"]

                cv2.rectangle(frame, (x, y), (x + w, y + h), color, 2)
                cv2.putText(frame, label, (x, y - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, color, 2)

このメソッドでは、まずself.annotationsの各要素に対して、現在のフレームに対応するアノテーション情報を検索しています。そして、アノテーション情報がない場合には、interpolate_annotationメソッドを使ってアノテーション情報を補完しています。

アノテーション情報が取得できたら、その情報をもとにフレーム上に矩形とラベルを描画しています。cv2.rectangle()関数は矩形を描画し、cv2.putText()関数はテキストを描画するために使用されます。

main

最後に、if __name__ == "__main__":ブロックがあります。


if __name__ == "__main__":
    json_file = "/content/drive/MyDrive/Ika-Action/datasets/project-1-at-2023-05-04-16-11-d6f787bd.json"
    video_file = "/content/drive/MyDrive/Ika-Action/datasets/12ac291c-52.mp4"

    video_annotator = LSVideoAnnotator(json_file=json_file, video_file=video_file)
    video_annotator.save_annotated_frames()

このブロックでは、JSONファイルと動画ファイルのパスを指定し、LSVideoAnnotatorクラスのインスタンスを作成しています。そして、save_annotated_frames()メソッドを呼び出して、アノテーション付きのフレームを保存しています。

このコードを実行することで、指定した動画ファイルにアノテーションを付けたフレーム画像が保存されます。

実用性と応用事例

行動認識の実用事例

製造業

製造業では、作業者の動きを分析し、効率的な作業方法を導き出すために行動認識技術が活用されています。これにより、生産性の向上や労働環境の改善が期待できます。

ロボット技術

ロボット技術分野では、行動認識技術が人間とのコミュニケーションや協調作業を実現するために利用されています。また、自動運転車の安全性向上にも役立っています。

医療分野

医療分野では、患者の行動を分析し、病状の把握やリハビリテーションの効果を評価するために行動認識技術が活用されています。これにより、より適切な治療が提供できます。

まとめ

PythonとLabelStudioの活用

この記事では、Python初心者でも分かる行動認識のためのLabelStudioを使った動画トラッキングアノテーションの可視化プログラムについて解説しました。PythonとLabelStudioを組み合わせることで、効率的に行動認識を行うことができます。

行動認識の可能性

行動認識技術は、さまざまな分野で活用されており、その可能性は無限です。今後も技術の進化により、より高度な行動認識が実現されることでしょう。

可視化プログラムの重要性

動画トラッキングアノテーションの可視化プログラムは、データの理解を容易にし、分析結果を見やすく表現することができます。これにより、行動認識の精度向上や、効果的な分析が可能になります。Pythonを活用した可視化プログラムは、データ分析の大きな力となるでしょう。

FAQ

LabelStudioとは何ですか？

LabelStudioは、動画や画像、テキストなど様々なデータにアノテーションを付けることができるオープンソースのツールです。機械学習やAI技術を活用するために、データに情報を付加することができます。

動画トラッキングアノテーションはどのような用途に使われますか？

動画トラッキングアノテーションは、動画内のオブジェクトや人物を追跡し、その位置情報を取得する技術です。この技術は、スポーツ分析や広告業界、製造業など多くの分野で活用されています。

行動認識技術はどのように進化していますか？

行動認識技術は、ディープラーニングや機械学習の発展により、より精度の高い行動認識が可能になっています。また、エッジデバイス上でのリアルタイム行動認識も現実的になってきており、さまざまな分野での活用が期待されています。

行動認識技術の応用事例はどのようなものがありますか？

行動認識技術は、スポーツ分析、広告業界、製造業、ロボット技術、医療分野など、多くの分野で活用されています。それぞれの分野で、効果的な分析や、より適切なサービス提供が可能になっています。