AI Deep Dive, Chapter 4. 딥러닝, 그것이 알고싶다 02. Backpropagation 깊은 인공신경망의 학습

AI Deep Dive Note

Chapter 4 - 02. Backpropagation 깊은 인공신경망의 학습

gradient descent를 하기 위해서, 미분은 어떻게 구할까?

모델에서, 모두 편미분을 해서, 미분을 구해야할 것
몇 번이나? w, b 총 17개

ch0402_1

17x1 짜리 gradient 를 구해야한다
- learning rate만큼 곱해서, 업데이트를 해서, 가주면 됨
이런 편미분을 할 때, backpropagation이라는 것을 해야한다.
- chain rule인데, 방향이 뒤에서부터 앞으로가서, backpropagation이라고 부르는 것

ch0402_2

맨 바깥 출력층부터 한번 표현해보자면(빨간색)
- \(d_2\): 들어가는 것
- \(n_2\): 나가는 것
- \(w_1\): weight, 얘에 대한 Loss의 편미분을 구해보자
- \(b_2\): bias
- \(f_2\): activation function
- \(\hat{y_1}\): 최종 출력, estimated
- \(y_1\): 참값
- \(\hat{y_2}\): 최종 출력, estimated
- \(y_2\): 참값
- Loss: 일단 MSE로 가정, 참값과 estimation 값의 차이의 제곱

ch0402_7

ch0402_9

이번엔, \(w_2\)에 대한 편미분을 구해보자
- path 1
  
  출력 * 액(티베이션의 미분) * 웨(이트) * 액(티베이션의 미분) * 앤(입력)
- path 2

ch0402_4

ch0402_5

정리하자면, forward propagation을 한번 해서, 값들을 구해놓고, backward propagation을 통해서 미분을 구해야 한다.

무슨 말?

\(d_1, d_2\)를 구하려면 데이터를 넣어봐야 한다.

ch0402_6

추가 예제

import torch
import time
from memory_profiler import profile

# 시작 시간 기록
start_time = time.time()

@profile
def main():
    # 데이터셋 생성
    x = torch.tensor([[1.0], [2.0], [3.0], [4.0]], dtype=torch.float32)
    y = torch.tensor([[3.0], [5.0], [7.0], [9.0]], dtype=torch.float32)

    # 모델 파라미터 초기화
    w = torch.tensor([[0.0]], requires_grad=True, dtype=torch.float32)
    b = torch.tensor([[0.0]], requires_grad=True, dtype=torch.float32)

    # 학습률 설정
    learning_rate = 0.01

    # 학습 루프
    num_epochs = 1000
    for epoch in range(num_epochs):
        # 순전파 계산
        predictions = x.mm(w) + b
        
        # 손실 계산
        loss = ((predictions - y) ** 2).mean()

        # 그래디언트 계산, backpropagation
        dw = 2 * x.t().mm(predictions - y) / x.size(0)
        db = 2 * (predictions - y).sum() / x.size(0)

        # 가중치 업데이트
        with torch.no_grad():
            w -= learning_rate * dw
            b -= learning_rate * db
        
        # 로그 출력
        if (epoch + 1) % 100 == 0:
            print(f'Epoch [{epoch + 1}/{num_epochs}], Loss: {loss.item():.4f}')

    # 최종 학습된 모델의 가중치 출력
    print("최종 학습된 가중치 w:", w.item())
    print("최종 학습된 편향 b:", b.item())

if __name__ == "__main__":
    main()

# 종료 시간 기록
end_time = time.time()

# 실행 시간 계산
execution_time = end_time - start_time

print(f"코드 실행 시간: {execution_time:.4f} 초")

출력

Epoch [100/1000], Loss: 0.0076
Epoch [200/1000], Loss: 0.0042
Epoch [300/1000], Loss: 0.0023
Epoch [400/1000], Loss: 0.0013
Epoch [500/1000], Loss: 0.0007
Epoch [600/1000], Loss: 0.0004
Epoch [700/1000], Loss: 0.0002
Epoch [800/1000], Loss: 0.0001
Epoch [900/1000], Loss: 0.0001
Epoch [1000/1000], Loss: 0.0000
최종 학습된 가중치 w: 2.0048611164093018
최종 학습된 편향 b: 0.9857079386711121
Filename: bp.py

Line #    Mem usage    Increment  Occurrences   Line Contents
=============================================================
  290.3 MiB    290.3 MiB           1   @profile
                                       def main():
                                           # 데이터셋 생성
  290.3 MiB      0.0 MiB           1       x = torch.tensor([[1.0], [2.0], [3.0], [4.0]], dtype=torch.float32)
  290.3 MiB      0.0 MiB           1       y = torch.tensor([[3.0], [5.0], [7.0], [9.0]], dtype=torch.float32)
                                       
                                           # 모델 파라미터 초기화
  290.3 MiB      0.0 MiB           1       w = torch.tensor([[0.0]], requires_grad=True, dtype=torch.float32)
  290.3 MiB      0.0 MiB           1       b = torch.tensor([[0.0]], requires_grad=True, dtype=torch.float32)
                                       
                                           # 학습률 설정
  290.3 MiB      0.0 MiB           1       learning_rate = 0.01
                                       
                                           # 학습 루프
  290.3 MiB      0.0 MiB           1       num_epochs = 1000
  297.6 MiB      0.0 MiB        1001       for epoch in range(num_epochs):
                                               # 순전파 계산
  297.6 MiB      1.5 MiB        1000           predictions = x.mm(w) + b
                                               
                                               # 손실 계산
  297.6 MiB      2.8 MiB        1000           loss = ((predictions - y) ** 2).mean()
                                       
                                               # 그래디언트 계산, backpropagation
  297.6 MiB      0.0 MiB        1000           dw = 2 * x.t().mm(predictions - y) / x.size(0)
  297.6 MiB      0.0 MiB        1000           db = 2 * (predictions - y).sum() / x.size(0)
                                       
                                               # 가중치 업데이트
  297.6 MiB      0.0 MiB        1000           with torch.no_grad():
  297.6 MiB      0.0 MiB        1000               w -= learning_rate * dw
  297.6 MiB      0.0 MiB        1000               b -= learning_rate * db
                                               
                                               # 로그 출력
  297.6 MiB      0.0 MiB        1000           if (epoch + 1) % 100 == 0:
  297.6 MiB      3.0 MiB          10               print(f'Epoch [{epoch + 1}/{num_epochs}], Loss: {loss.item():.4f}')
                                       
                                           # 최종 학습된 모델의 가중치 출력
  297.6 MiB      0.0 MiB           1       print("최종 학습된 가중치 w:", w.item())
  297.6 MiB      0.0 MiB           1       print("최종 학습된 편향 b:", b.item())


코드 실행 시간: 0.8926 초

pytorch

import torch
import time
from memory_profiler import profile

# 시작 시간 기록
start_time = time.time()

@profile
def main():
    # 데이터셋 생성
    x = torch.tensor([[1.0], [2.0], [3.0], [4.0]], dtype=torch.float32)
    y = torch.tensor([[3.0], [5.0], [7.0], [9.0]], dtype=torch.float32)

    # 모델 파라미터 초기화
    w = torch.tensor([[0.0]], requires_grad=True, dtype=torch.float32)
    b = torch.tensor([[0.0]], requires_grad=True, dtype=torch.float32)

    # 학습률 설정
    learning_rate = 0.01

    # 손실 함수 정의 (MSE Loss)
    loss_fn = torch.nn.MSELoss()

    # 옵티마이저 정의 (확률적 경사 하강법 SGD)
    optimizer = torch.optim.SGD([w, b], lr=learning_rate)

    # 학습 루프
    num_epochs = 1000
    for epoch in range(num_epochs):
        # 순전파 계산
        predictions = x.mm(w) + b

        # 손실 계산
        loss = loss_fn(predictions, y)

        # 그래디언트 계산 및 역전파
        optimizer.zero_grad()  # 그래디언트 초기화
        loss.backward()  # 역전파
        optimizer.step()  # 가중치 업데이트

        # 로그 출력
        if (epoch + 1) % 100 == 0:
            print(f'Epoch [{epoch + 1}/{num_epochs}], Loss: {loss.item():.4f}')

    # 최종 학습된 모델의 가중치 출력
    print("최종 학습된 가중치 w:", w.item())
    print("최종 학습된 편향 b:", b.item())

if __name__ == "__main__":
    main()

# 종료 시간 기록
end_time = time.time()

# 실행 시간 계산
execution_time = end_time - start_time

print(f"코드 실행 시간: {execution_time:.4f} 초")

실행결과, performance check

Filename: bp_pytorch.py

Line #    Mem usage    Increment  Occurrences   Line Contents
=============================================================
  292.9 MiB    292.9 MiB           1   @profile
                                       def main():
                                           # 데이터셋 생성
  292.9 MiB      0.0 MiB           1       x = torch.tensor([[1.0], [2.0], [3.0], [4.0]], dtype=torch.float32)
  292.9 MiB      0.0 MiB           1       y = torch.tensor([[3.0], [5.0], [7.0], [9.0]], dtype=torch.float32)
                                       
                                           # 모델 파라미터 초기화
  292.9 MiB      0.0 MiB           1       w = torch.tensor([[0.0]], requires_grad=True, dtype=torch.float32)
  292.9 MiB      0.0 MiB           1       b = torch.tensor([[0.0]], requires_grad=True, dtype=torch.float32)
                                       
                                           # 학습률 설정
  292.9 MiB      0.0 MiB           1       learning_rate = 0.01
                                       
                                           # 손실 함수 정의 (MSE Loss)
  292.9 MiB      0.0 MiB           1       loss_fn = torch.nn.MSELoss()
                                       
                                           # 옵티마이저 정의 (확률적 경사 하강법 SGD)
  292.9 MiB      0.0 MiB           1       optimizer = torch.optim.SGD([w, b], lr=learning_rate)
                                       
                                           # 학습 루프
  292.9 MiB      0.0 MiB           1       num_epochs = 1000
  301.9 MiB      0.0 MiB        1001       for epoch in range(num_epochs):
                                               # 순전파 계산
  301.9 MiB      1.3 MiB        1000           predictions = x.mm(w) + b
                                       
                                               # 손실 계산
  301.9 MiB      2.8 MiB        1000           loss = loss_fn(predictions, y)
                                       
                                               # 그래디언트 계산 및 역전파
  301.9 MiB      0.0 MiB        1000           optimizer.zero_grad()  # 그래디언트 초기화
  301.9 MiB      5.0 MiB        1000           loss.backward()  # 역전파
  301.9 MiB      0.0 MiB        1000           optimizer.step()  # 가중치 업데이트
                                       
                                               # 로그 출력
  301.9 MiB      0.0 MiB        1000           if (epoch + 1) % 100 == 0:
  301.9 MiB      0.0 MiB          10               print(f'Epoch [{epoch + 1}/{num_epochs}], Loss: {loss.item():.4f}')
                                       
                                           # 최종 학습된 모델의 가중치 출력
  301.9 MiB      0.0 MiB           1       print("최종 학습된 가중치 w:", w.item())
  301.9 MiB      0.0 MiB           1       print("최종 학습된 편향 b:", b.item())


코드 실행 시간: 1.1261 초

Share on

Twitter Facebook LinkedIn

AI Deep Dive, Chapter 4. 딥러닝, 그것이 알고싶다 02. Backpropagation 깊은 인공신경망의 학습

Chapter 4 - 02. Backpropagation 깊은 인공신경망의 학습

추가 예제

출력

pytorch

실행결과, performance check

Share on

You may also enjoy

CUDA example code 1. main.cpp, kernel.cu, kernel.cuh 구성으로 빌드하기, ubuntu

CMake

Tesseract OCR and OpenCV in Ubuntu

Qt for Linux