SourceSage: プロジェクトの構造と内容をAIに理解しやすい形式で提示するツール

初心者の皆さん、こんにちは！今回は、プロジェクトのソースコードとファイル構成を単一のマークダウンファイルに変換するPythonスクリプト「SourceSage」について解説します。SourceSageを使えば、大規模言語モデル（AI）がプロジェクト全体の構造と内容を容易に理解できるようになります。

こちらの記事もおすすめ

SourceSage v2.0～Gitログも活用～高品質リリースノートを爆速作成

はじめにSourceSage 2.0.0のリリースを発表できることを嬉しく思います！このバージョンでは、ツールの使いやすさ、保守性、効率性を向上させるために、いくつかの重要な機能強化と新機能が導入されています。SourceSageは、プロジ...

claude.aiとSourceSageを使ってリリースノートを爆速作成！

初心者の皆さん、こんにちは！今回は、大規模言語モデル（AI）であるclaude.aiと、プロジェクトのソースコードとファイル構成を単一のマークダウンファイルに変換するPythonスクリプト「SourceSage」を組み合わせて、リリースノー...

SourceSageの特徴
SourceSageの使い方
SourceSageのコード解説
リポジトリ
1. 関連

SourceSageの特徴

プロジェクトのディレクトリ構成とファイル内容を1つのマークダウンファイルにまとめます
AIがプロジェクトの概要を素早く把握できる構造化された形式で出力します
不要なファイルやディレクトリを除外する設定が可能です
プロジェクトの全体像を明確かつ読みやすい方法で提示します

SourceSageの使い方

SourceSage.pyファイルを、分析対象のプロジェクトのルートディレクトリにコピーします。
必要に応じて、SourceSage.py内の設定を変更します：
- folders: 分析対象のディレクトリ（デフォルトは現在のディレクトリ）
- exclude_patterns: 除外するファイル/フォルダのパターン
- output_file: 出力するマークダウンファイル名
ターミナルまたはコマンドプロンプトで、プロジェクトのルートディレクトリに移動し、python SourceSage.pyを実行します。

これにより、AIがプロジェクトの構造と内容を理解しやすい形式のマークダウンファイルが生成されます。

SourceSageのコード解説

それでは、SourceSageのコードを詳しく見ていきましょう。

import os

class SourceSage:
    def __init__(self, folders, exclude_patterns=None, output_file='output.md'):
        self.folders = folders
        self.exclude_patterns = exclude_patterns or []
        self.output_file = output_file

SourceSageクラスのコンストラクタでは、分析対象のフォルダ、除外パターン、出力ファイル名を設定します。

def generate_markdown(self):
    with open(self.output_file, 'w', encoding='utf-8') as md_file:
        project_name = os.path.basename(os.path.abspath(self.folders[0]))
        md_file.write(f"# Project: {project_name}\n\n")
        for folder in self.folders:
            markdown_content = self._generate_markdown_for_folder(folder)
            md_file.write(markdown_content + '\n\n')

generate_markdownメソッドでは、出力ファイルを開き、プロジェクト名を書き込みます。その後、各フォルダに対して_generate_markdown_for_folderメソッドを呼び出し、マークダウンコンテンツを生成して出力ファイルに書き込みます。

def _generate_markdown_for_folder(self, folder_path):
    markdown_content = "```plaintext\n"
    markdown_content += self._display_tree(dir_path=folder_path)
    markdown_content += "\n```\n\n"
    base_level = folder_path.count(os.sep)
    for root, dirs, files in os.walk(folder_path, topdown=True):
        if self._is_excluded(root):
            dirs[:] = []  # Don't walk into excluded directories
            continue
        level = root.count(os.sep) - base_level + 1
        header_level = '#' * (level + 1)
        relative_path = os.path.relpath(root, folder_path)
        markdown_content += f"{header_level} {relative_path}\n\n"
        for f in files:
            file_path = os.path.join(root, f)
            if self._is_excluded(file_path):
                continue
            relative_file_path = os.path.relpath(file_path, folder_path)
            try:
                with open(file_path, 'r', encoding='utf-8') as file_content:
                    content = file_content.read().strip()
                    markdown_content += f"`{relative_file_path}`\n\n```plaintext\n{content}\n```\n\n"
            except Exception as e:
                markdown_content += f"`{relative_file_path}` - Error reading file: {e}\n\n"
    return markdown_content

_generate_markdown_for_folderメソッドは、フォルダのマークダウンコンテンツを生成します。
まず、フォルダのディレクトリ構造を表示します。
次に、os.walkを使ってフォルダ内のすべてのサブフォルダとファイルを再帰的に処理します。
除外パターンに一致するフォルダはスキップします。
各サブフォルダに対して、相対パスとヘッダーレベルを決定し、マークダウンコンテンツに追加します。
各ファイルに対して、除外パターンに一致しない場合、ファイルの内容を読み取ってマークダウンコンテンツに追加します。

def _display_tree(self, dir_path='.', string_rep=True, header=True, max_depth=None, show_hidden=False):
    tree_string = ""
    if header:
        tree_string += f"OS: {os.name}\nDirectory: {os.path.abspath(dir_path)}\n\n"
    tree_string += self._build_tree_string(dir_path, max_depth, show_hidden, depth=0)
    if string_rep:
        return tree_string.strip()
    else:
        print(tree_string.strip())

def _build_tree_string(self, dir_path, max_depth, show_hidden, depth=0):
    tree_string = ""
    if depth == max_depth:
        return tree_string
    for item in os.listdir(dir_path):
        if not show_hidden and item.startswith('.'):
            continue
        if self._is_excluded(item):
            continue
        item_path = os.path.join(dir_path, item)
        if os.path.isdir(item_path):
            tree_string += '│  ' * depth + '├─ ' + item + '/\n'
            tree_string += self._build_tree_string(item_path, max_depth, show_hidden, depth + 1)
        else:
            tree_string += '│  ' * depth + '├─ ' + item + '\n'
    return tree_string

def _is_excluded(self, path):
    return any(pattern in path for pattern in self.exclude_patterns)

_display_treeメソッドは、指定されたディレクトリのツリー構造を表示します。
_build_tree_stringメソッドは、再帰的にツリー構造の文字列表現を構築します。
_is_excludedメソッドは、パスが除外パターンに一致するかどうかを確認します。

最後に、SourceSageの使用例を示します：

folders = ['./']  # 現在のディレクトリを対象に
exclude_patterns = ['.git', '__pycache__', 'LICENSE', 'output.md', 'assets', 'Style-Bert-VITS2', 'output', 'streamlit', 'SourceSage.md', 'data', 'SourceSage']  # 除外するファイル/フォルダのパターン
source_sage = SourceSage(folders, exclude_patterns=exclude_patterns, output_file='SourceSage.md')
source_sage.generate_markdown()