AI_Program3 の履歴(No.4) - PukiWiki

[ トップ ] [ 一覧 | 検索 | 履歴 | ログイン ]

私的AI研究会 > AI_Program3

生成 AI プログラミング３ == 編集中 ==†

　これまで検証してきた結果をもとに、Python で生成 AI プログラムを書く

▲　目　次

生成 AI プログラミング３ == 編集中 ==
参考資料

※ 最終更新:2025/07/08　

diffusersではじめめる Stable Diffusion （応用編２）†

　画像から画像を生成する　img2img & controlnet

　参考サイト：instruct-pix2pixで画像を指示した通り変更したり

動作環境†

このプロジェクトは以下の Anaconda 仮想環境とプロジェクト・フォルダで動作する
```
(base) PS > conda activate sd_test
(sd_test) PS > cd workspace_3/sd_test
```

Step 40：「instruct-pix2pix」で画像を変換する†

　　SD1.5 版　　「雪の中の場面にする」

「sd_040.py」　　元になる画像（右） sd_040_test.png　生成画像（左） image_040.png →

## sd_040.py【SD1.5】　画像から画像生成（instruct-pix2pix）サンプル・ソースコード
##      https://qiita.com/phyblas/items/28c342740c2ed00250b8
##      Model: https://huggingface.co/timbrooks/instruct-pix2pix
##      Ver. 0.00   2025/07/05

import torch
from PIL import Image
from diffusers import StableDiffusionInstructPix2PixPipeline, logging
from translate import Translator

logging.set_verbosity_error()

# フォルダーのパス
model_path = "timbrooks/instruct-pix2pix"                       # モデル
image_path = "images/sd_040_test.png"                           # 元画像

# GPUを使う場合は"cuda" 使わない場合は"cpu"
device = 'cuda' if torch.cuda.is_available() else 'cpu'

# seed 値
seed = 0

# パイプラインを作成
if device == 'cpu':
    pipeline = StableDiffusionInstructPix2PixPipeline.from_pretrained(model_path).to(device)
else:
    pipeline = StableDiffusionInstructPix2PixPipeline.from_pretrained(
                    model_path,
                    torch_dtype = torch.float16,
                    ).to(device)

# プロンプト
trans = Translator('en','ja').translate
prompt_jp = '雪の中の場面にする'                                # プロンプト
prompt = trans(prompt_jp)
src_image = Image.open(image_path)

# Generatorオブジェクト作成
generator = torch.Generator(device).manual_seed(seed)

print(f'Seed: {seed}, Model: {model_path}')
print(f'source_image: {image_path}')
print(f'prompt : {prompt_jp} → {prompt}')

# 画像を生成
image = pipeline(
                    prompt = prompt,
                    image = src_image,
                    num_inference_steps = 20,
                    image_guidance_scale = 1.5,
                    generator = generator
                    ).images[0]

image.save("results/image_040.png")                            # 生成画像

プログラムを実行する（実行時間：約 3秒 RTX 4070 Ti 12GB）

(sd_test) PS > python sd_040.py
Loading pipeline components...: 100%|████████████| 7/7 [00:02<00:00,  2.48it/s]
Seed: 0, Model: timbrooks/instruct-pix2pix
source_image: images/sd_040_test.png
prompt : 雪の中の場面にする → Make it a scene in the snow
100%|██████████████████████████████████████████| 20/20 [00:01<00:00, 16.78it/s]

画像ファイル「image_040.png」が生成される

　　SDXL 版　　「雪の中の場面にする」

「sd_040a.py」　　元になる画像は同じ sd_040_test.png　生成画像（左） image_040a.png →

## sd_040a.py【SDXL】　画像から画像生成（instruct-pix2pix）サンプル・ソースコード
##      https://qiita.com/phyblas/items/28c342740c2ed00250b8
##      Model: https://huggingface.co/timbrooks/instruct-pix2pix
##      Ver. 0.00   2025/07/07

import torch
from PIL import Image
from diffusers import StableDiffusionXLInstructPix2PixPipeline, logging
from translate import Translator

logging.set_verbosity_error()

# フォルダーのパス
model_path = "diffusers/sdxl-instructpix2pix-768"              # モデル
image_path = "images/sd_040_test.png"                          # 元画像

# GPUを使う場合は"cuda" 使わない場合は"cpu"
device = 'cuda' if torch.cuda.is_available() else 'cpu'

# seed 値
seed = 0

# 画像サイズ
resolution = 768

# パイプラインを作成
pipeline = StableDiffusionXLInstructPix2PixPipeline.from_pretrained(
                    model_path,
                    torch_dtype = torch.float16,
                    ).to(device)

# プロンプト
trans = Translator('en','ja').translate
prompt_jp = '雪の中の場面にする'                                # プロンプト
prompt = trans(prompt_jp)
#src_image = Image.open(image_path)

from diffusers.utils import load_image
src_image = load_image(image_path).resize((resolution, resolution))

# Generatorオブジェクト作成
generator = torch.Generator(device).manual_seed(seed)

print(f'Seed: {seed}, Model: {model_path}')
print(f'source_image: {image_path}')
print(f'prompt : {prompt_jp} → {prompt}')

# 画像を生成
image = pipeline(
                    prompt = prompt,
                    image = src_image,
                    height = resolution,
                    width = resolution,
                    guidance_scale=3.0,
                    image_guidance_scale = 1.5,
                    num_inference_steps = 20,
                    generator = generator
                    ).images[0]

image.save("results/image_040a.png")                            # 生成画像

プログラムを実行する（実行時間：約 8秒 RTX 4070 Ti 12GB）

(sd_test) PS > python sd_040a.py
Loading pipeline components...: 100%|████████████| 7/7 [00:05<00:00,  1.38it/s]
Seed: 0, Model: diffusers/sdxl-instructpix2pix-768
source_image: images/sd_040_test.png
prompt : 雪の中の場面にする → Make it a scene in the snow
100%|██████████████████████████████████████████| 20/20 [00:03<00:00,  5.30it/s]

画像ファイル「image_040a.png」が生成される

SD1.5 / SDXL モデルによる生成画像の比較

プロンプト雪の中の場面にする春の場面にする夏の場面にする秋の場面にする冬の場面にする

SD1.5

SDXL

Step 41：「instruct-pix2pix」image_guidance_scale パラメータによる変化をみる†

image_guidance_scale
・画像をどれくらい変えるかを決めるパラメータ
・1 以上を設定（初期値：1.5）

　　SD1.5 版　

プログラムを実行する（実行時間：約 12秒 RTX 4070 Ti 12GB）

(sd_test) PS > python sd_041.py
Seed: 12345678, Model: timbrooks/instruct-pix2pix
source_image: images/sd_040_test.png
prompt : 雪の中の場面にする → Make it a scene in the snow
** image_guidance_scale 1.0 ～ 1.5 **
Loading pipeline components...: 100%|████████████| 7/7 [00:01<00:00,  5.17it/s]
100%|██████████████████████████████████████████| 20/20 [00:01<00:00, 17.95it/s]
Loading pipeline components...: 100%|████████████| 7/7 [00:01<00:00,  5.85it/s]
100%|██████████████████████████████████████████| 20/20 [00:01<00:00, 18.39it/s]
Loading pipeline components...: 100%|████████████| 7/7 [00:01<00:00,  6.01it/s]
100%|██████████████████████████████████████████| 20/20 [00:01<00:00, 17.88it/s]
Loading pipeline components...: 100%|████████████| 7/7 [00:01<00:00,  5.57it/s]
100%|██████████████████████████████████████████| 20/20 [00:01<00:00, 18.38it/s]
Loading pipeline components...: 100%|████████████| 7/7 [00:01<00:00,  6.14it/s]
100%|██████████████████████████████████████████| 20/20 [00:01<00:00, 18.45it/s]
Loading pipeline components...: 100%|████████████| 7/7 [00:01<00:00,  6.13it/s]
100%|██████████████████████████████████████████| 20/20 [00:01<00:00, 18.44it/s]

画像ファイル「image_041.png」が生成される

モジュール・ソースコード

▼「sd_041.py」

## sd_041.py【SD1.5】　画像から画像生成（instruct-pix2pix）サンプル・ソースコード
## === イメージ・ガイダンススケールを調べる ===
##      https://qiita.com/phyblas/items/28c342740c2ed00250b8
##      Model: https://huggingface.co/timbrooks/instruct-pix2pix
##      Ver. 0.00   2025/07/05

import torch
from PIL import Image
from diffusers import StableDiffusionInstructPix2PixPipeline, logging
from translate import Translator
import matplotlib.pyplot as plt

logging.set_verbosity_error()

# 画像生成
def image_generation(ig_scale):
    # パイプラインを作成
    if device == 'cpu':
        pipeline = StableDiffusionInstructPix2PixPipeline.from_pretrained(model_path).to(device)
    else:
        pipeline = StableDiffusionInstructPix2PixPipeline.from_pretrained(
                    model_path,
                    torch_dtype = torch.float16,
                    ).to(device)

    # Generatorオブジェクト作成
    generator = torch.Generator(device).manual_seed(seed)

    # 画像を生成
    img = pipeline(
                    prompt = prompt,
                    image = src_image,
                    num_inference_steps = 20,
                    image_guidance_scale = ig_scale,
                    generator = generator
                    ).images[0]
    return img

# フォルダーのパス
model_path = "timbrooks/instruct-pix2pix"                      # モデル
image_path = "images/sd_040_test.png"                          # 元画像

# GPUを使う場合は"cuda" 使わない場合は"cpu"
device = 'cuda' if torch.cuda.is_available() else 'cpu'

# seed 値
seed = 12345678

# プロンプト
trans = Translator('en','ja').translate
prompt_jp = '雪の中の場面にする'                               # プロンプト
prompt = trans(prompt_jp)
src_image = Image.open(image_path)

print(f'Seed: {seed}, Model: {model_path}')
print(f'source_image: {image_path}')
print(f'prompt : {prompt_jp} → {prompt}')
print('** image_guidance_scale 1.0 ～ 1.5 **')


# 複数画像を生成
plt.figure(figsize=[6, 9.5], dpi = 100)
for i in range(6):
    ig_scale = 1 + 0.1 * i
    img = image_generation(ig_scale)
    plt.subplot(3, 2, i + 1, title = 'image_guidance_scale = %.1f'%ig_scale)
    plt.imshow(img)
    plt.axis('off')

    # メモリー開放
    if device == 'cuda':
        torch.cuda.empty_cache()
    elif device == 'mps':
        torch.mps.empty_cache()

plt.tight_layout()
plt.savefig('results/image_041.png')
plt.close()

　　SD1.5 版　

プログラムを実行する（実行時間：約 12秒 RTX 4070 Ti 12GB）

(sd_test) PS > python sd_041a.py
Seed: 0, Model: diffusers/sdxl-instructpix2pix-768
source_image: images/sd_040_test.png
prompt : 雪の中の場面にする → Make it a scene in the snow
** image_guidance_scale 1.0 ～ 1.5 **
Loading pipeline components...: 100%|████████████| 7/7 [00:02<00:00,  2.95it/s]
100%|██████████████████████████████████████████| 30/30 [00:05<00:00,  5.40it/s]
Loading pipeline components...: 100%|████████████| 7/7 [00:02<00:00,  3.10it/s]
100%|██████████████████████████████████████████| 30/30 [00:05<00:00,  5.41it/s]
Loading pipeline components...: 100%|████████████| 7/7 [00:02<00:00,  2.97it/s]
100%|██████████████████████████████████████████| 30/30 [00:05<00:00,  5.40it/s]
Loading pipeline components...: 100%|████████████| 7/7 [00:02<00:00,  3.10it/s]
100%|██████████████████████████████████████████| 30/30 [00:05<00:00,  5.41it/s]
Loading pipeline components...: 100%|████████████| 7/7 [00:02<00:00,  3.00it/s]
100%|██████████████████████████████████████████| 30/30 [00:05<00:00,  5.38it/s]
Loading pipeline components...: 100%|████████████| 7/7 [00:02<00:00,  3.09it/s]
100%|██████████████████████████████████████████| 30/30 [00:05<00:00,  5.39it/s]

画像ファイル「image_041a.png」が生成される

モジュール・ソースコード

▼「sd_041a.py」

## sd_041a.py【SDXL】　画像から画像生成（instruct-pix2pix）サンプル・ソースコード
## === イメージ・ガイダンススケールを調べる ===
##      https://qiita.com/phyblas/items/28c342740c2ed00250b8
##      Model: https://huggingface.co/timbrooks/instruct-pix2pix
##      Ver. 0.00   2025/07/07

import torch
from PIL import Image
from diffusers import StableDiffusionXLInstructPix2PixPipeline, logging
from translate import Translator
from diffusers.utils import load_image
import matplotlib.pyplot as plt

logging.set_verbosity_error()

# 画像生成
def image_generation(ig_scale):
    # パイプラインを作成
    pipeline = StableDiffusionXLInstructPix2PixPipeline.from_pretrained(
                    model_path,
                    torch_dtype = torch.float16,
                    ).to(device)

    # Generatorオブジェクト作成
    generator = torch.Generator(device).manual_seed(seed)

    # 画像を生成
    img = pipeline(
                    prompt = prompt,
                    image = src_image,
                    height = resolution,
                    width = resolution,
                    guidance_scale=3.0,
                    image_guidance_scale = ig_scale,
                    num_inference_steps = 30,
                    generator = generator
                    ).images[0]
    return img

# フォルダーのパス
model_path = "diffusers/sdxl-instructpix2pix-768"               # モデル
image_path = "images/sd_040_test.png"                           # 元画像

# GPUを使う場合は"cuda" 使わない場合は"cpu"
device = 'cuda' if torch.cuda.is_available() else 'cpu'

# seed 値
seed = 12345678
seed = 0

# 画像サイズ
resolution = 768

# プロンプト
trans = Translator('en','ja').translate
prompt_jp = '雪の中の場面にする'                                # プロンプト
prompt = trans(prompt_jp)
src_image = load_image(image_path).resize((resolution, resolution))

print(f'Seed: {seed}, Model: {model_path}')
print(f'source_image: {image_path}')
print(f'prompt : {prompt_jp} → {prompt}')
print('** image_guidance_scale 1.0 ～ 1.5 **')


# 複数画像を生成
plt.figure(figsize=[6, 9.5], dpi = 100)
for i in range(6):
    ig_scale = 1.0 + 0.1 * i
    img = image_generation(ig_scale)
    plt.subplot(3, 2, i + 1, title = 'image_guidance_scale = %.1f'%ig_scale)
    plt.imshow(img)
    plt.axis('off')

    # メモリー開放
    if device == 'cuda':
        torch.cuda.empty_cache()
    elif device == 'mps':
        torch.mps.empty_cache()

plt.tight_layout()
plt.savefig('results/image_041a.png')
plt.close()

忘備録†

更新履歴†

2025/07/05 初版

参考資料†

Stable Diffusion

【Stable Diffusion】画像を修正する「Inpaint」機能や「txt2mask」について解説

書籍など
- 日経ソフトウエア 2025年7月号「ローカル生成AIプログラミング」
- Interface 2025年3月号「画像による異常検出＆ローカルLLM作り - 仕事のための生成AI」