연암과 다산 사이 :: Deep Learning from Scratch (2)

Deep Learning from Scratch (2) - Neural Network,

호모파베르 2018. 10. 24. 17:42

☞ http://ya-n-ds.tistory.com/3230 : Deep Learning from Scratch (1)

Reference : Deep Learning from Scratch ( 사이토 고키, 한빛미디어 )

< Chap 3. 신경망 >

신경망 : 가중치 매개변수를 스스로 학습
- 입력층 + 은닉층 + 출력층

3.1.1 Perceptron with Bias
y = h(b + w1x1 + w2x2)
h(x) = 0 (x<=0), 1 (x>0)

3.1.3 Activation Function
Processing in Neuron
Step 1. a = b + w1x1 + w2x2 ;; Bias + Weighed input
Step 2. y = h(a) ;; Calcuation of output

3.2 Activation Function
- Step function : 임계값을 경계로 출력이 바뀜

3.2.1 Sigmoid Function
h(x) = 1/(1+exp(-x))

3.2.2 Implementation of Step function
def step_function(x): # x is float value, cannot be array
     if x > 0:
         return 1
     else:
         return 0

import numpy as np
def step_function(x): # x is array-type
y = x > 0 # y is bool-type array
return y.astype(np.int) # astype : change data type for array

3.2.3 Graph of Step function
import numpy as np
import matplotlib.pylab as plt

def step_function(x):
return np.array(x>0, dtype=np.int) # dtype : change data type

x = np.arange(-5.0, 5.0, 0.1)
y = setp_function(x)
plt.polt(x,y)
plt.ylim(-0.1, 1.1) # set y-axis limit
plt.show()

3.2.3 Implementation of Sigmoid function
def sigmoid(x):
return 1/(1+np.exp(-x))

x = np.array([-1.0, 1.0, 2.0])
sigmoid(x)

x = np.arange(-5.0, 5.0, 0.1)
y = sigmoid(x)
plt.plot(x,y)
plt.ylim(-0.1, 1.1)
plt.show()

3.2.6 비선형 함수
- 선형함수 문제점 : 층을 아무리 깊게 해도 '은닉층이 없는 네트워크'로 똑같은 기능 가능
e.g. h(x)=cx -> y(x) = h(h(h(x))) -> y(x)=c*c*c*x == y(x)=ax ( a=c^3)

3.2.7 ReLU(Rectified Linear Unit) 함수
h(x) = x (if x>0), 0 (if x<=0)

def relu(x):
return np.maximum(0, x) # Return the larger value

3.3. 다차원 배열 연산
3.3.1 다차원 배열
## 1차원 배열
import numpy as np
A = np.arrary([1,2,3,4])
print(A) # -> [1,2,3,4]
np.ndim(A) # -> 1
A.shape # -> (4,) : Tuple-format
A.shape[0] # -> 4 : Shape of 1st element

## 2차원 배열 ( matrix )
import numpy as np
B = np.arrary([1,2],[3,4],[5,6]) # 3x2 array
print(B)
# -> ([1,2]
[3,4]
[5,6])
np.ndim(B) # -> 2
B.shape # -> (3,2) : Tuple-format
B.shape[0] # -> 2 ?? : Shape of 1st element

3.3.2 행렬의 곱(내적)
A = np.arrary([1,2],[3,4])
A.shape # -> (2,2)
B = np.arrary([5,6],[7,8])
np.dot(A,B)
# -> array([19,22],
[43,50])

A = np.arrary([1,2,3],[4,5,6])
A.shape # -> (2,3)

B = np.arrary([1,2],[3,4],[5,6])
B.shape # -> (3,2)
np.dot(A,B) # size of 1st-dimension of A == size of 0th-dimension of B
# -> array([22,28],
[49,64])

3.3.3 신경망의 내적
x1, x2 -> w1, ..., w6 -> y1, y2, y3

X = np.array([1,2]) # X.shape -> (2,)
W = np.array([1,3,5], [2,4,6]) # X.shape -> (2,3)
Y = np.dot(X,W) # -> [5,11,17]

3.3.4 3층 신경망 구현
입력층(0층) 2개 + 1st 은닉층(1층) 3개 + 2nd 은닉층(2층) 2개 + 출력층(3층) 2개
H1(1x3) = X(1x2) x W1(2x3) -> H2(1x2) = H1(1x3) x W2(3x2) -> Y(1x2) = H2(1x2) x W3(2x2)

3.4.2 각 층의 신호 전달 구현
# Input layer -> 1st layer
a1(1) = w(1)11*x1 + w(1)12*x2 + b(1)1, a2(1), a3(1)
A(1) = X•W(1) + B(1)
   A(1) = (a(1)1 a(1)2 a(1)3), X = (x1 x2), B(1) = (b(1)1 b(1)2 b(1)3)
   W(1) = ( w(1)11 w(1)21 w(1)31
            w(1)12 w(1)22 w(1)32 )

X = np.array([1.0, 0.5])
W1 = np.array([0.1, 0.3, 0.5]. [0.2, 0.4, 0.6])
B1 = np.array([0.1, 0.2, 0.3])
A1 = np.dot(X,W1) + B1

Z1 = sigmoid(A1)

# 1st layer (3ea) -> 2nd layer (2ea)
W2 = np.array([0.1, 0.4], [0.2, 0.5], [0.3, 0.6])
B2 = np.array([0.1, 0.2])

A2 = np.dot(Z1,W2) + B2
Z2 = sigmoid(A2)

# 2nd layer (2ea) -> Output layer (2ea)
def identity_function(x):
return x

W3 = np.array([0.1, 0.3], [0.2, 0.4])
B3 = np.array([0.1, 0.2])

A3 = np.dot(Z2,W3) + B3
Y = identify_function(A3) # Y=A3

3.4.3 구현 정리
def init_network():
     network = {} # dictionary declaration
     network['W1'] = np.array([0.1, 0.3, 0.5]. [0.2, 0.4, 0.6])
     network['b1'] = np.array([0.1, 0.2, 0.3])
     network['W2'] = np.array([0.1, 0.4], [0.2, 0.5], [0.3, 0.6])
     network['b2'] = np.array([0.1, 0.2])
     network['W3'] = np.array([0.1, 0.3], [0.2, 0.4])
     network['b3'] = np.array([0.1, 0.2])

return network

def forward(network, x):
W1, W2, W3 = network['W1'], network['W2'], network['W3']
b1, b2, b3 = network['b1'], network['b2'], network['b3']

a1 = np.dot(x,W1) + b1
z1 = sigmoid(a1)

a2 = np.dot(z1,W2) + b2
z2 = sigmoid(a2)

a3 = np.dot(z2,W3) + b3
y = identify_function(a3)

return y

network = init_network()
x = np.array([1.0, 0.5])
y = forward(network, x)

cf. Dictionary : dic= {'Key_Name':'Data'}
=> dic['Key_Name'] -> Data
=> dic.get['Key_Name'] -> Data
cf. dic['Key_Name_Added'] = 'Data_Added' ~ dic = {'Key_Name_Added':'Data_Added'}
cf. dic.keys(), dic.values(), dic.items()
cf. dic.clear()

3.5 출력층 설계
- 분류(classification) : 소프트맥스 함수 -> 구별, 인식
- 회귀(regression) : 항등 함수 -> 입력 데이터에서 (연속적인) 수치 예측

3.5.1 Softmax function Implementation
yk = exp(ak)/Sum(exp(ai)) # i=1~n

def softmax(a):
     exp_a = np.exp(a)
     sum_exp_a = np.sum(exp_a)
     y = exp_a / sum_exp_a

return y

a = np.arrary([0.3, 2.9, 4.0])
y = softmax(a)

cf. Overflow prevention for Softmax function
yk = exp(ak)/Sum(exp(ai)) => exp(ak+C)/Sum(exp(ai+C))

def softmax(a):
     C = max(a)
     exp_a = np.exp(a - C)
     sum_exp_a = np.sum(exp_a - C)
     y = exp_a / sum_exp_a

return y

3.5.3 Characteristics of Softmax function
- Output value range : 0~1
- Sum of outputs = 1
- Probability

cf. 학습(출력층에서 소프트맥스 함수 사용), 추론(출력층에서 소프트맥스 함수 생략)

3.5.4 출력층의 뉴런 수 정하기

- 분류 : 분류하고 싶은 클래스 수

3.6 손글씨 숫자 인식
cf. 가중치 매개변수 학습(w/ 학습 데이타) -> 추론
3.6.1 MNIST Data set ( 0~9 숫자 이미지, 28x28 크기의 회색조 이미지, 0~255/Pixel )
- 훈련 이미지 : 60,000장
- 시험 이미지 : 10,000장

- MNIST 데이타셋 변환 스크립트 ( @Git hub 저장소, dataset/mnist.py ) -> work directory : ch01, ch02, ..., ch08 중 하나

## Display MNIST image
import sys, os # import modules cf. module unit : '*.py' file
sys.path.append(os.pardir) # add search path
import numpy as np
from dataset.mnist import load_mnist # import load_mnist function form 'mnist' module
from PIL import Image # PIL: Python Image Library module

def img_show(img):
pil_img = Image.fromarray(np.uint8(img)) # transform the numpy data into PIL data object. uint8 for 8-bit pixel data
pil_img.show()

(x_train, y_train), (x_test, y_test) = load_mnist(flatten=True, normalize=False) # flatten : True -> 1-dimensional array

print(x_train.shape) # Training image (60000, 784) cf. 784 = 28x28
print(t_train.shape) # Training label (60000, )
print(x_test.shape) # Test image (10000, 784) cf. 784 = 28x28
print(t_test.shape) # Test label (10000, )

img = x_train[0]
label = t_train[0]
print(label) # '5'

print(img.shape) # (784, ) - flattened

img = img.reshape(28,28) # Restore 28x28 array
print(img.shape) # (28, 28)

img_show(img)

3.6.2 신경망의 추론 처리
- 입력층: 784(28x28) -> 출력층: 10(0~9) via 1st 은닉층(50ea neuron), 2nd 은닉층(100ea neuron)

def get_data():
     (x_train, t_train), (x_test, t_test) =\
         load_mnist(normalize=True, flatten=True, one_hot_label=False)
     return x_test, t_test

def init_network():
     with open("sample_weight.pkl", 'rb') as f:
         network = pickle.load(f) # Weight, Bias variable are saved as dictional variable form
     return network

def predict(network, x):
W1, W2, W3 = network['W1'], network['W2'], network['W3']
b1, b2, b3 = network['b1'], network['b2'], network['b3']

     a1 = np.dot(x, W1) + b1
     z1 = sigmoid(a1)
     a2 = np.dot(z1, W2) + b2
     z2 = sigmoid(a2)
     a3 = np.dot(z2, W3) + b3
     y = softmax(a3) # 1 dimensional array with 10ea elements

x, t = get_data() # x : Array elements for test images
network = init_network

accuracy_cnt = 0
for i in range(len(x)):
     y = predict(network, x[i])
     p = np.argmax(y) # get the index whose value is the greatest
     if p == t[i]:
         accuracy_cnt += 1

print( "Accuracy:" +str( float(accuracy_cnt)/len(x) ) )

3.6.3 배치(batch) 처리
x, t = get_data()
network = init_network()
W1, W2, W3 = network['W1'], network['W2'], network['W3']

x.shape # (10000, 784)
x[0].shape # (784,)
W1.shape # (7784, 50)
W2.shape # (50, 100)
W3.shape # (100, 10)

// Shape of Weight, Input, Output for Batch

X W1 W2 W3 Y
784 784x50 50x100 100x10 10

=> for batch size = 100
X            W1          W2             W3         Y
100x784   784x50     50x100        100x10     10
              a1 100x50 a2 100x100 a3 100x10

x, t = get_data()
network = init_network()

batch_size = 100
accuracy_cnt = 0

for i in range(0, len(x), batch_size):

     # range(start, end, step), i =0, bitch_size, 2*batch_size, ...
     x_batch = x[i:i+biatch_size]
     y_batch = predict(network, x_batch)
     p = np.argmax(y_batch, axis=1)
       # axis usage ☞ http://gomguard.tistory.com/145
     accuracy_cnt += np.sum(p==t[i:i+batch_size])

print( "Accuracy:" +str( float(accuracy_cnt)/len(x) ) )

cf. np.sum() function
>>> y = np.array([1,2,1,0])
>>> t = np.array([1,2,0,0])
>>> print(y==t)
[True, True, False, True]
>>> np.sum(y==t)
3

------------

저작자표시 비영리 변경금지 (새창열림)

Posted by 명랑만화

AND

연암과 다산 사이

TAG CLOUD

Deep Learning from Scratch (2) - Neural Network,

ARTICLE CATEGORY

RECENT ARTICLE

RECENT COMMENT

RECENT TRACKBACK

CALENDAR

ARCHIVE

LINK

티스토리툴바