はじめに
住宅価格の予測は、機械学習の典型的な例です。この記事では、インタラクティブな環境で機械学習のコードを実行できるツールであるOpen Interpreterを使用して、カリフォルニアの住宅価格を予測する方法を紹介します。Pythonのscikit-learnライブラリを用いて、シンプルな線形回帰モデルを構築し、その性能を評価し、結果を可視化します。
デモ動画
@maki.sunwood.ai.labs Open Interpreter RTA Open Interpreterでカリフォルニアの住宅価格の予測をやってみた #OpenInterpreter #RTA #データサイエンス #住宅価格 #Python #chatbot #LLM #codellama
Open Interpreterのセットアップ
まず、Open Interpreterをセットアップする必要があります。動作させるための手順は以下の通りです:
- ターミナルを開き、Open InterpreterのDockerファイルが格納されているディレクトリに移動します。
E:\Prj\open-interpreter-docker
- 次のコマンドを実行してOpen Interpreterを起動します:
docker-compose exec open-interpreter /bin/bash
- モデルを選択するプロンプトが表示されます。このチュートリアルでは、Code Llamaを7Bパラメータと中程度の品質で選択しました。
コマンド全体
Microsoft Windows [Version 10.0.22621.2283]
(c) Microsoft Corporation. All rights reserved.
E:\Prj\open-interpreter-docker>docker-compose exec open-interpreter /bin/bash
root@1e361d1eb084:~# interpreter
●
Welcome to Open Interpreter.
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
▌ OpenAI API key not found
To use GPT-4 (recommended) please provide an OpenAI API key.
To use Code-Llama (free but less capable) press enter.
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
OpenAI API key:
▌ Switching to Code-Llama...
Tip: Run interpreter --local to automatically use Code-Llama.
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Open Interpreter will use Code Llama for local execution. Use your arrow keys to set up the model.
[?] Parameter count (smaller is faster, larger is more capable): 7B
> 7B
13B
34B
[?] Quality (smaller is faster, larger is more capable): Medium | Size: 3.8 GB, Estimated RAM usage: 6.3 GB
Small | Size: 2.6 GB, Estimated RAM usage: 5.1 GB
> Medium | Size: 3.8 GB, Estimated RAM usage: 6.3 GB
Large | Size: 6.7 GB, Estimated RAM usage: 9.2 GB
See More
[?] Use GPU? (Large models might crash on GPU, but will run more quickly) (Y/n): Y
Model found at /root/.local/share/Open Interpreter/models/codellama-7b-instruct.Q4_K_M.gguf
▌ Model set to TheBloke/CodeLlama-7B-Instruct-GGUF
Open Interpreter will require approval before running code. Use interpreter -y to bypass this.
Press CTRL-C to exit.
> `from sklearn.datasets import fetch_california_housing`これを用いてカリフォルニア住宅価格のデータセットを用いて住宅価格の回帰予測と学習データとテストデータの結果の可視化をstep-by-stepで
行って
^C^C^C^C^C^CError response from daemon: No such exec instance: 9feca4b49a551e5f658f68544fad06ec151aec8f6fe38b28dbf0059cc2ac82b0
E:\Prj\open-interpreter-docker>docker-compose exec open-interpreter /bin/bash
root@1e361d1eb084:~# interpreter
●
Welcome to Open Interpreter.
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
▌ OpenAI API key not found
To use GPT-4 (recommended) please provide an OpenAI API key.
To use Code-Llama (free but less capable) press enter.
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
OpenAI API key:
▌ Switching to Code-Llama...
Tip: Run interpreter --local to automatically use Code-Llama.
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Open Interpreter will use Code Llama for local execution. Use your arrow keys to set up the model.
[?] Parameter count (smaller is faster, larger is more capable): 7B
> 7B
13B
34B
[?] Quality (smaller is faster, larger is more capable): Medium | Size: 3.8 GB, Estimated RA[?] Quality (smaller is faster, larger is more capable): Medium | Size: 3.8 GB, Estimated RAM usage: 6.3 GB
Small | Size: 2.6 GB, Estimated RAM usage: 5.1 GB
Small | Size: 2.6 GB, Estimated RAM usage: 5.1 GB
> Medium | Size: 3.8 GB, Estimated RAM usage: 6.3 GB
Large | Size: 6.7 GB, Estimated RAM usage: 9.2 GB
See More
[?] Use GPU? (Large models might crash on GPU, but will run more quickly) (Y/n): n
Model found at /root/.local/share/Open Interpreter/models/codellama-7b-instruct.Q4_K_M.gguf
▌ Model set to TheBloke/CodeLlama-7B-Instruct-GGUF
Open Interpreter will require approval before running code. Use interpreter -y to bypass this.
Press CTRL-C to exit.
> `from sklearn.datasets import fetch_california_housing`これを用いてカリフォルニア住宅価格のデータセットを用いて住宅価格の回帰予測と学習データとテストデータの結果の可視化をstep-by-stepで行って
from sklearn.datasets import fetch_california_housing
Would you like to run this code? (y/n)
> y
from sklearn.datasets import fetch_california_housing
Would you like to run this code? (y/n)
y
from sklearn.datasets import fetch_california_housing
I will now proceed to run the next line of code.
housing = fetch_california_housing()
Would you like to run this code? (y/n)
y
housing = fetch_california_housing()
I will now proceed to run the next line of code.
from sklearn.model_selection import train_test_split
Would you like to run this code? (y/n)
y
from sklearn.model_selection import train_test_split
I will now proceed to run the next line of code.
X = housing.data
y = housing.target
Would you like to run this code? (y/n)
y
X = housing.data
y = housing.target
I will now proceed to run the next line of code.
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
Would you like to run this code? (y/n)
y
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
I will now proceed to run the next line of code.
from sklearn.linear_model import LinearRegression
Would you like to run this code? (y/n)
y
from sklearn.linear_model import LinearRegression
I will now proceed to run the next line of code.
reg = LinearRegression()
Would you like to run this code? (y/n)
y
reg = LinearRegression()
I will now proceed to run the next line of code.
reg.fit(X_train, y_train)
Would you like to run this code? (y/n)
y
reg.fit(X_train, y_train)
LinearRegression()
LinearRegression()
Would you like to run this code? (y/n)
y
LinearRegression()
LinearRegression()
I will now proceed to run the next line of code.
y_pred = reg.predict(X_test)
Would you like to run this code? (y/n)
y
y_pred = reg.predict(X_test)
I will now proceed to run the next line of code.
from sklearn.metrics import mean_squared_error
Would you like to run this code? (y/n)
y
from sklearn.metrics import mean_squared_error
I will now proceed to run the next line of code.
mse = mean_squared_error(y_test, y_pred)
Would you like to run this code? (y/n)
y
mse = mean_squared_error(y_test, y_pred)
I will now proceed to run the next line of code.
print("Mean Squared Error:", mse)
Would you like to run this code? (y/n)
y
print("Mean Squared Error:", mse)
Mean Squared Error: 0.5558915986952427
0.5558915986952427, it appears that the mean squared error between the predicted and actual
values is approximately 0.56. This suggests that the linear regression model is not a very good
fit for the data.
To improve the performance of the model, you could try using different algorithms or features
to represent the data. You could also try increasing the size of the training set or decreasing
the regularization parameter to see if that improves the model's performance.
What would you like to do next?
解説
ステップ1:データのインポート
まず、scikit-learnからカリフォルニアの住宅データセットをインポートします。
from sklearn.datasets import fetch_california_housing
housing = fetch_california_housing()
このコードは、データセットを取得し、housing
という変数に格納します。
ステップ2:データの分割
データセットを訓練データとテストデータに分割する必要があります。訓練にはデータの80%、テストには20%を使用します。
from sklearn.model_selection import train_test_split
X = housing.data
y = housing.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
ステップ3:モデルの訓練
次に、訓練データを用いて線形回帰モデルを訓練します。
from sklearn.linear_model import LinearRegression
reg = LinearRegression()
reg.fit(X_train, y_train)
ステップ4:予測の実行
モデルが訓練された後、テストセットで予測を行います。
y_pred = reg.predict(X_test)
ステップ5:モデルの評価
モデルの性能を平均二乗誤差(MSE)で評価します。
from sklearn.metrics import mean_squared_error
mse = mean_squared_error(y_test, y_pred)
print("Mean Squared Error:", mse)
MSEの値は約0.56であり、線形回帰モデルがデータに非常によく適合していないことを示唆しています。
まとめ
この記事では、Open InterpreterとPythonのscikit-learnライブラリを使用して、カリフォルニアの住宅価格を予測する方法を示しました。シンプルな線形回帰モデルは特別に優れた性能を示しませんでしたが、より複雑なモデリングに向けた最初のステップとしてはよいのではないでしょうか?
コメント