【驚愕のコスパ】LLaMa 3.1 405B を 8B に変換！Google Colab で簡単実装

このガイドでは、高品質を維持しながらコストを大幅に削減できる、LLaMa 3.1 405B から LLaMa 3.1 8B への変換方法をステップバイステップで解説します。OctoAI inference を使用するため、OctoAI のアカウント登録と API キーの取得が必要です。

作成者: Matt Shumer (https://twitter.com/mattshumer_) & Ben Hamm (https://www.linkedin.com/in/hammben/)

GitHub レポジトリ: https://github.com/mshumer/gpt-prompt-engineer

ステップ 1: 必要なライブラリのインストール
ステップ 2: OctoAI API キーの取得と設定
ステップ 3: 変換コードの実行
ステップ 4: タスク、プロンプト例、応答例の設定
ステップ 5: 変換の実行と結果の確認
📒ノートブック
1. 関連

ステップ 1: 必要なライブラリのインストール

まずは必要なライブラリをインストールします。

# openaiライブラリをインストール
!pip install openai

ステップ 2: OctoAI API キーの取得と設定

OctoAI のウェブサイト (https://octo.ai) で API キーを取得し、以下のコードの PLACE YOUR KEY HERE を置き換えてください。

import os

# 環境変数にOctoAI APIキーを設定
os.environ["OCTOAI_API_KEY"] = "PLACE YOUR KEY HERE"

OctoAIについてはこちら

OctoAIの料金プランと機能：AI開発を加速させる革新的なプラットフォーム

はじめにこんにちは！今回は、AI開発の世界に革命をもたらしているOctoAIについて、詳しく解説していきます。OctoAIは、開発者がAIアプリケーションを簡単に構築できるよう設計された強力なプラットフォームです。その料金体系や機能について...

ステップ 3: 変換コードの実行

以下のコードを実行すると、LLaMa 3.1 405B から LLaMa 3.1 8B への変換が開始されます。

from IPython.display import HTML, display
import re
import json
import os
from openai import OpenAI

# コードの表示形式を設定する関数
def set_css():
    display(HTML('''
    <style>
        pre {
            white-space: pre-wrap;
        }
    </style>
    '''))

# セルの実行前にset_css関数を実行
get_ipython().events.register('pre_run_cell', set_css)

# OpenAIクライアントを初期化し、OctoAIのベースURLを設定
client = OpenAI(
    base_url="https://text.octoai.run/v1",
    api_key=os.environ['OCTOAI_API_KEY'],
)

# 使用するモデル名を定義
small_model = "meta-llama-3.1-8b-instruct"  # 小さいモデル
big_model = "meta-llama-3.1-405b-instruct"  # 大きいモデル

# 学習用プロンプトの候補を生成する関数
def generate_candidate_prompts(task, prompt_example, response_example):
    # システムプロンプトを定義
    system_prompt = """<task>Given an example training sample, create seven additional samples for the same task that are even better. Each example should contain a <prompt> and a <response>.</task>

<rules>
1. Ensure the new examples are diverse and unique from one another.
2. They should all be perfect. If you make a mistake, this system won't work.
</rules>

Respond in this format:
<response_format>
<example_one>
<prompt>
PUT_PROMPT_HERE
</prompt>
<response>
PUT_RESPONSE_HERE
</response>
</example_one>

<example_two>
<prompt>
PUT_PROMPT_HERE
</prompt>
<response>
PUT_RESPONSE_HERE
</response>
</example_two>

...
</response_format>"""

    # ユーザーコンテンツを定義
    user_content = f"""<training_task>{task}</training_task>

<prompt_example>
{prompt_example}
</prompt_example>

<response_example>
{response_example}
</response_example>"""

    # 大きいモデルを使用して応答を生成
    response = client.chat.completions.create(
        model=big_model,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": user_content}
        ],
        max_tokens=4000,  # 生成する最大トークン数を指定
        temperature=0.5  # 生成されるテキストのランダム性を制御
    )

    response_text = response.choices[0].message.content

    # プロンプトと応答を解析
    prompts_and_responses = []
    examples = re.findall(r'<example_\w+>(.*?)</example_\w+>', response_text, re.DOTALL)
    for example in examples:
        prompt = re.findall(r'<prompt>(.*?)</prompt>', example, re.DOTALL)[0].strip()
        response = re.findall(r'<response>(.*?)</response>', example, re.DOTALL)[0].strip()
        prompts_and_responses.append({'prompt': prompt, 'response': response})

    # 解析結果を返す
    return prompts_and_responses

# システムプロンプトを生成する関数
def generate_system_prompt(task, prompt_examples):
    # システムプロンプトを定義
    system_prompt = """<your_role>Given a user-description of their <task> a set of prompt / response pairs (it'll be in JSON for easy reading) for the types of outputs we want to generate given inputs, write a fantastic system prompt that describes the task to be done perfectly.</your_role>

<rules>
1. Do this perfectly.
2. Respond only with the system prompt, and nothing else. No other text will be allowed.
</rules>

Respond in this format:
<system_prompt>
WRITE_SYSTEM_PROMPT_HERE
</system_prompt>"""

    # ユーザーコンテンツを定義
    user_content = f"""<task>{task}</task>

<prompt_response_examples>
{str(prompt_examples)}
</prompt_response_examples>"""

    # 大きいモデルを使用して応答を生成
    response = client.chat.completions.create(
        model=big_model,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": user_content}
        ],
        max_tokens=1000,  # 生成する最大トークン数を指定
        temperature=0.5  # 生成されるテキストのランダム性を制御
    )

    response_text = response.choices[0].message.content

    # プロンプトを解析
    generated_system_prompt = response_text.split('<system_prompt>')[1].split('</system_prompt>')[0].strip()

    # 解析結果を返す
    return generated_system_prompt

# 小さいモデルをテストする関数
def test_small_model(generated_examples, prompt_example, system_prompt):
    messages = [{"role": "system", "content": system_prompt}]

    for example in generated_examples:
        messages.append({"role": "user", "content": example['prompt']})
        messages.append({"role": "assistant", "content": example['response']})

    messages.append({"role": "user", "content": prompt_example.strip()})

    # 小さいモデルを使用して応答を生成
    response = client.chat.completions.create(
        model=small_model,
        messages=messages,
        max_tokens=2000,  # 生成する最大トークン数を指定
        temperature=0.5  # 生成されるテキストのランダム性を制御
    )

    response_text = response.choices[0].message.content

    # 応答テキストを返す
    return response_text

# 変換プロセスを実行する関数
def run_conversion_process(task, prompt_example, response_example):
    print('プロンプトと応答を生成しています...')
    # 候補プロンプトを生成
    generated_examples = generate_candidate_prompts(task, prompt_example, response_example)

    print('プロンプトと応答の生成が完了しました。システムプロンプトを生成しています...')

    # システムプロンプトを生成
    system_prompt = generate_system_prompt(task, generated_examples)

    print('システムプロンプトが生成されました:', system_prompt)

    print(f'\n\n{small_model} で、入力例を使用して新しいプロンプトをテストしています...')
    # 生成された例とシステムプロンプトを使用して、小さいモデルをテスト
    small_model_response = test_small_model(generated_examples, prompt_example, system_prompt)

    print(f'{small_model} は次のように応答しました:')
    print(small_model_response)

    print('\n\n!! ファイルディレクトリを確認してください。プロンプトが保存されています !!')

    # すべての関連情報を含む辞書を作成
    result = {
        "task": task,
        "initial_prompt_example": prompt_example,
        "initial_response_example": response_example,
        "generated_examples": generated_examples,
        "system_prompt": system_prompt,
        "small_model_response": small_model_response
    }

    # 小さいモデルのプロンプトをPythonファイルに保存
    with open("small_model_prompt.py", "w") as file:
        file.write('system_prompt = """' + system_prompt + '"""\n\n')

        file.write('messages = [\n')
        for example in generated_examples:
            file.write('    {"role": "user", "content": """' + example['prompt'] + '"""},\n')
            file.write('    {"role": "assistant", "content": """' + example['response'] + '"""},\n')

        file.write('    {"role": "user", "content": """' + prompt_example.strip() + '"""}\n')
        file.write(']\n')

    # 結果を返す
    return result

ステップ 4: タスク、プロンプト例、応答例の設定

以下のコードの task、prompt_example、response_example を、変換したい内容に合わせて変更してください。

# タスク、プロンプト例、応答例を設定
task = "複雑なコードのリファクタリング"

# 入力例
prompt_example = """def calculate_total(prices, tax, discount, shipping_fee, gift_wrap_fee, membership_discount):

    total = 0

    for i in range(len(prices)):

        total += prices[i]

    if membership_discount != 0:

        total = total - (total * (membership_discount / 100))

    if discount != 0:

        total = total - (total * (discount / 100))

    total = total + (total * (tax / 100))

    if total < 50:

        total += shipping_fee

    else:

        total += shipping_fee / 2

    if gift_wrap_fee != 0:

        total += gift_wrap_fee * len(prices)

    if total > 1000:

        total -= 50

    elif total > 500:

        total -= 25

    total = round(total, 2)

    if total < 0:

        total = 0

    return total"""

# 期待される出力例
response_example = """def calculate_total(prices, tax_rate, discount_rate, shipping_fee, gift_wrap_fee, membership_discount_rate):

    def apply_percentage_discount(amount, percentage):

        return amount * (1 - percentage / 100)

    def calculate_shipping_fee(total):

        return shipping_fee if total < 50 else shipping_fee / 2

    def apply_tier_discount(total):

        if total > 1000:

            return total - 50

        elif total > 500:

            return total - 25

        return total

    subtotal = sum(prices)

    subtotal = apply_percentage_discount(subtotal, membership_discount_rate)

    subtotal = apply_percentage_discount(subtotal, discount_rate)

    total = subtotal * (1 + tax_rate / 100)

    total += calculate_shipping_fee(total)

    total += gift_wrap_fee * len(prices)

    total = apply_tier_discount(total)

    total = max(0, round(total, 2))

    return total"""

ステップ 5: 変換の実行と結果の確認

以下のコードを実行すると、LLaMa 3.1 8B 用の新しいプロンプトが生成され、small_model_prompt.py というファイルに保存されます。

# 変換プロセスを実行
result = run_conversion_process(task, prompt_example, response_example)

プロンプトと応答を生成しています...
プロンプトと応答の生成が完了しました。システムプロンプトを生成しています...
システムプロンプトが生成されました: 複雑なコードのリファクタリングを行うシステムを開発します。与えられたコードを分析し、より効率的で読みやすいコードに書き直します。コードの冗長性を排除し、パフォーマンスを向上させ、エラーハンドリングを改善します。開発言語はPythonです。

meta-llama-3.1-8b-instruct で、入力例を使用して新しいプロンプトをテストしています...
meta-llama-3.1-8b-instruct は次のように応答しました:
    ```python
    def calculate_total(prices, tax, discount, shipping_fee, gift_wrap_fee, membership_discount):
        total = sum(prices)

        if membership_discount:
            total *= (1 - membership_discount / 100)

        if discount:
            total *= (1 - discount / 100)

        total *= (1 + tax / 100)

        if total < 50:
            total += shipping_fee
        else:
            total += shipping_fee / 2

        if gift_wrap_fee:
            total += gift_wrap_fee * len(prices)

        if total > 1000:
            total -= 50
        elif total > 500:
            total -= 25

        total = round(total, 2)

        return max(total, 0)

!! ファイルディレクトリを確認してください。プロンプトが保存されています !!

これで、高品質を維持しながら LLaMa 3.1 8B を使用してコストを削減できます。