☞ http://ya-n-ds.tistory.com/3230 : Deep Learning from Scratch (1)
Reference : Deep Learning from Scratch ( 사이토 고키, 한빛미디어 )
< Chap 3. 신경망 >
신경망 : 가중치 매개변수를 스스로 학습
- 입력층 + 은닉층 + 출력층
3.1.1 Perceptron with Bias
y = h(b + w1x1 + w2x2)
h(x) = 0 (x<=0), 1 (x>0)
3.1.3 Activation Function
Processing in Neuron
Step 1. a = b + w1x1 + w2x2 ;; Bias + Weighed input
Step 2. y = h(a) ;; Calcuation of output
3.2 Activation Function
- Step function : 임계값을 경계로 출력이 바뀜
3.2.1 Sigmoid Function
h(x) = 1/(1+exp(-x))
3.2.2 Implementation of Step function
def step_function(x): # x is float value, cannot be array
if x > 0:
return 1
else:
return 0
import numpy as np
def step_function(x): # x is array-type
y = x > 0 # y is bool-type array
return y.astype(np.int) # astype : change data type for array
3.2.3 Graph of Step function
import numpy as np
import matplotlib.pylab as plt
def step_function(x):
return np.array(x>0, dtype=np.int) # dtype : change data type
x = np.arange(-5.0, 5.0, 0.1)
y = setp_function(x)
plt.polt(x,y)
plt.ylim(-0.1, 1.1) # set y-axis limit
plt.show()
3.2.3 Implementation of Sigmoid function
def sigmoid(x):
return 1/(1+np.exp(-x))
x = np.array([-1.0, 1.0, 2.0])
sigmoid(x)
x = np.arange(-5.0, 5.0, 0.1)
y = sigmoid(x)
plt.plot(x,y)
plt.ylim(-0.1, 1.1)
plt.show()
3.2.6 비선형 함수
- 선형함수 문제점 : 층을 아무리 깊게 해도 '은닉층이 없는 네트워크'로 똑같은 기능 가능
e.g. h(x)=cx -> y(x) = h(h(h(x))) -> y(x)=c*c*c*x == y(x)=ax ( a=c^3)
3.2.7 ReLU(Rectified Linear Unit) 함수
h(x) = x (if x>0), 0 (if x<=0)
def relu(x):
return np.maximum(0, x) # Return the larger value
3.3. 다차원 배열 연산
3.3.1 다차원 배열
## 1차원 배열
import numpy as np
A = np.arrary([1,2,3,4])
print(A) # -> [1,2,3,4]
np.ndim(A) # -> 1
A.shape # -> (4,) : Tuple-format
A.shape[0] # -> 4 : Shape of 1st element
## 2차원 배열 ( matrix )
import numpy as np
B = np.arrary([1,2],[3,4],[5,6]) # 3x2 array
print(B)
# -> ([1,2]
[3,4]
[5,6])
np.ndim(B) # -> 2
B.shape # -> (3,2) : Tuple-format
B.shape[0] # -> 2 ?? : Shape of 1st element
3.3.2 행렬의 곱(내적)
A = np.arrary([1,2],[3,4])
A.shape # -> (2,2)
B = np.arrary([5,6],[7,8])
np.dot(A,B)
# -> array([19,22],
[43,50])
A = np.arrary([1,2,3],[4,5,6])
A.shape # -> (2,3)
B = np.arrary([1,2],[3,4],[5,6])
B.shape # -> (3,2)
np.dot(A,B) # size of 1st-dimension of A == size of 0th-dimension of B
# -> array([22,28],
[49,64])
3.3.3 신경망의 내적
x1, x2 -> w1, ..., w6 -> y1, y2, y3
X = np.array([1,2]) # X.shape -> (2,)
W = np.array([1,3,5], [2,4,6]) # X.shape -> (2,3)
Y = np.dot(X,W) # -> [5,11,17]
3.3.4 3층 신경망 구현
입력층(0층) 2개 + 1st 은닉층(1층) 3개 + 2nd 은닉층(2층) 2개 + 출력층(3층) 2개
H1(1x3) = X(1x2) x W1(2x3) -> H2(1x2) = H1(1x3) x W2(3x2) -> Y(1x2) = H2(1x2) x W3(2x2)
3.4.2 각 층의 신호 전달 구현
# Input layer -> 1st layer
a1(1) = w(1)11*x1 + w(1)12*x2 + b(1)1, a2(1), a3(1)
A(1) = X•W(1) + B(1)
A(1) = (a(1)1 a(1)2 a(1)3), X = (x1 x2), B(1) = (b(1)1 b(1)2 b(1)3)
W(1) = ( w(1)11 w(1)21 w(1)31
w(1)12 w(1)22 w(1)32 )
X = np.array([1.0, 0.5])
W1 = np.array([0.1, 0.3, 0.5]. [0.2, 0.4, 0.6])
B1 = np.array([0.1, 0.2, 0.3])
A1 = np.dot(X,W1) + B1
Z1 = sigmoid(A1)
# 1st layer (3ea) -> 2nd layer (2ea)
W2 = np.array([0.1, 0.4], [0.2, 0.5], [0.3, 0.6])
B2 = np.array([0.1, 0.2])
A2 = np.dot(Z1,W2) + B2
Z2 = sigmoid(A2)
# 2nd layer (2ea) -> Output layer (2ea)
def identity_function(x):
return x
W3 = np.array([0.1, 0.3], [0.2, 0.4])
B3 = np.array([0.1, 0.2])
A3 = np.dot(Z2,W3) + B3
Y = identify_function(A3) # Y=A3
3.4.3 구현 정리
def init_network():
network = {} # dictionary declaration
network['W1'] = np.array([0.1, 0.3, 0.5]. [0.2, 0.4, 0.6])
network['b1'] = np.array([0.1, 0.2, 0.3])
network['W2'] = np.array([0.1, 0.4], [0.2, 0.5], [0.3, 0.6])
network['b2'] = np.array([0.1, 0.2])
network['W3'] = np.array([0.1, 0.3], [0.2, 0.4])
network['b3'] = np.array([0.1, 0.2])
return network
def forward(network, x):
W1, W2, W3 = network['W1'], network['W2'], network['W3']
b1, b2, b3 = network['b1'], network['b2'], network['b3']
a1 = np.dot(x,W1) + b1
z1 = sigmoid(a1)
a2 = np.dot(z1,W2) + b2
z2 = sigmoid(a2)
a3 = np.dot(z2,W3) + b3
y = identify_function(a3)
return y
network = init_network()
x = np.array([1.0, 0.5])
y = forward(network, x)
cf. Dictionary : dic= {'Key_Name':'Data'}
=> dic['Key_Name'] -> Data
=> dic.get['Key_Name'] -> Data
cf. dic['Key_Name_Added'] = 'Data_Added' ~ dic = {'Key_Name_Added':'Data_Added'}
cf. dic.keys(), dic.values(), dic.items()
cf. dic.clear()
3.5 출력층 설계
- 분류(classification) : 소프트맥스 함수 -> 구별, 인식
- 회귀(regression) : 항등 함수 -> 입력 데이터에서 (연속적인) 수치 예측
3.5.1 Softmax function Implementation
yk = exp(ak)/Sum(exp(ai)) # i=1~n
def softmax(a):
exp_a = np.exp(a)
sum_exp_a = np.sum(exp_a)
y = exp_a / sum_exp_a
return y
a = np.arrary([0.3, 2.9, 4.0])
y = softmax(a)
cf. Overflow prevention for Softmax function
yk = exp(ak)/Sum(exp(ai)) => exp(ak+C)/Sum(exp(ai+C))
def softmax(a):
C = max(a)
exp_a = np.exp(a - C)
sum_exp_a = np.sum(exp_a - C)
y = exp_a / sum_exp_a
return y
3.5.3 Characteristics of Softmax function
- Output value range : 0~1
- Sum of outputs = 1
- Probability
cf. 학습(출력층에서 소프트맥스 함수 사용), 추론(출력층에서 소프트맥스 함수 생략)
3.5.4 출력층의 뉴런 수 정하기
- 분류 : 분류하고 싶은 클래스 수
3.6 손글씨 숫자 인식
cf. 가중치 매개변수 학습(w/ 학습 데이타) -> 추론
3.6.1 MNIST Data set ( 0~9 숫자 이미지, 28x28 크기의 회색조 이미지, 0~255/Pixel )
- 훈련 이미지 : 60,000장
- 시험 이미지 : 10,000장
- MNIST 데이타셋 변환 스크립트 ( @Git hub 저장소, dataset/mnist.py ) -> work directory : ch01, ch02, ..., ch08 중 하나
## Display MNIST image
import sys, os # import modules cf. module unit : '*.py' file
sys.path.append(os.pardir) # add search path
import numpy as np
from dataset.mnist import load_mnist # import load_mnist function form 'mnist' module
from PIL import Image # PIL: Python Image Library module
def img_show(img):
pil_img = Image.fromarray(np.uint8(img)) # transform the numpy data into PIL data object. uint8 for 8-bit pixel data
pil_img.show()
(x_train, y_train), (x_test, y_test) = load_mnist(flatten=True, normalize=False) # flatten : True -> 1-dimensional array
print(x_train.shape) # Training image (60000, 784) cf. 784 = 28x28
print(t_train.shape) # Training label (60000, )
print(x_test.shape) # Test image (10000, 784) cf. 784 = 28x28
print(t_test.shape) # Test label (10000, )
img = x_train[0]
label = t_train[0]
print(label) # '5'
print(img.shape) # (784, ) - flattened
img = img.reshape(28,28) # Restore 28x28 array
print(img.shape) # (28, 28)
img_show(img)
3.6.2 신경망의 추론 처리
- 입력층: 784(28x28) -> 출력층: 10(0~9) via 1st 은닉층(50ea neuron), 2nd 은닉층(100ea neuron)
def get_data():
(x_train, t_train), (x_test, t_test) =\
load_mnist(normalize=True, flatten=True, one_hot_label=False)
return x_test, t_test
def init_network():
with open("sample_weight.pkl", 'rb') as f:
network = pickle.load(f) # Weight, Bias variable are saved as dictional variable form
return network
def predict(network, x):
W1, W2, W3 = network['W1'], network['W2'], network['W3']
b1, b2, b3 = network['b1'], network['b2'], network['b3']
a1 = np.dot(x, W1) + b1
z1 = sigmoid(a1)
a2 = np.dot(z1, W2) + b2
z2 = sigmoid(a2)
a3 = np.dot(z2, W3) + b3
y = softmax(a3) # 1 dimensional array with 10ea elements
x, t = get_data() # x : Array elements for test images
network = init_network
accuracy_cnt = 0
for i in range(len(x)):
y = predict(network, x[i])
p = np.argmax(y) # get the index whose value is the greatest
if p == t[i]:
accuracy_cnt += 1
print( "Accuracy:" +str( float(accuracy_cnt)/len(x) ) )
3.6.3 배치(batch) 처리
x, t = get_data()
network = init_network()
W1, W2, W3 = network['W1'], network['W2'], network['W3']
x.shape # (10000, 784)
x[0].shape # (784,)
W1.shape # (7784, 50)
W2.shape # (50, 100)
W3.shape # (100, 10)
// Shape of Weight, Input, Output for Batch
X W1 W2 W3 Y
784 784x50 50x100 100x10 10
=> for batch size = 100
X W1 W2 W3 Y
100x784 784x50 50x100 100x10 10
a1 100x50 a2 100x100 a3 100x10
x, t = get_data()
network = init_network()
batch_size = 100
accuracy_cnt = 0
for i in range(0, len(x), batch_size):
# range(start, end, step), i =0, bitch_size, 2*batch_size, ...
x_batch = x[i:i+biatch_size]
y_batch = predict(network, x_batch)
p = np.argmax(y_batch, axis=1)
# axis usage ☞ http://gomguard.tistory.com/145
accuracy_cnt += np.sum(p==t[i:i+batch_size])
print( "Accuracy:" +str( float(accuracy_cnt)/len(x) ) )
cf. np.sum() function
>>> y = np.array([1,2,1,0])
>>> t = np.array([1,2,0,0])
>>> print(y==t)
[True, True, False, True]
>>> np.sum(y==t)
3
------------