http://ya-n-ds.tistory.com/3230 : Deep Learning from Scratch (1)


Reference : Deep Learning from Scratch ( 사이토 고키, 한빛미디어 )


< Chap 3. 신경망

신경망 : 가중치 매개변수를 스스로 학습
- 입력층 + 은닉층 + 출력층 


3.1.1 Perceptron with Bias
 y = h(b + w1x1 + w2x2)
 h(x) = 0 (x<=0), 1 (x>0)


3.1.3 Activation Function
Processing in Neuron
 Step 1. a = b + w1x1 + w2x2  ;; Bias + Weighed input
 Step 2. y = h(a)  ;; Calcuation of output


3.2 Activation Function
- Step function : 임계값을 경계로 출력이 바뀜

3.2.1 Sigmoid Function
 h(x) = 1/(1+exp(-x))


3.2.2 Implementation of Step function
 def step_function(x):  # x is float value, cannot be array
     if x > 0:
         return 1
         return 0

 import numpy as np
 def step_function(x):  # x is array-type
     y = x > 0  # y is bool-type array
     return y.astype(np.int)  # astype : change data type for array


3.2.3 Graph of Step function
 import numpy as np
 import matplotlib.pylab as plt

 def step_function(x):
     return np.array(x>0, dtype=np.int)  # dtype : change data type

 x = np.arange(-5.0, 5.0, 0.1)
 y = setp_function(x)
 plt.ylim(-0.1, 1.1)  # set y-axis limit


3.2.3 Implementation of Sigmoid function
 def sigmoid(x):
     return 1/(1+np.exp(-x)) 


 x = np.array([-1.0, 1.0, 2.0])

 x = np.arange(-5.0, 5.0, 0.1)
 y = sigmoid(x)
 plt.ylim(-0.1, 1.1)


3.2.6 비선형 함수
- 선형함수 문제점 : 층을 아무리 깊게 해도 '은닉층이 없는 네트워크'로 똑같은 기능 가능
 e.g. h(x)=cx -> y(x) = h(h(h(x))) -> y(x)=c*c*c*x == y(x)=ax ( a=c^3)


3.2.7 ReLU(Rectified Linear Unit) 함수
 h(x) = x (if x>0), 0 (if x<=0)

 def relu(x):
     return np.maximum(0, x)  # Return the larger value


3.3. 다차원 배열 연산
3.3.1 다차원 배열
 ## 1차원 배열
 import numpy as np
 A = np.arrary([1,2,3,4])
 print(A)  # -> [1,2,3,4]
 np.ndim(A)  # -> 1
 A.shape  # -> (4,) : Tuple-format
 A.shape[0] # -> 4 : Shape of 1st element 


 ## 2차원 배열 ( matrix )
 import numpy as np
 B = np.arrary([1,2],[3,4],[5,6])  # 3x2 array
  # -> ([1,2]
 np.ndim(B)  # -> 2
 B.shape  # -> (3,2) : Tuple-format
 B.shape[0] # -> 2 ?? : Shape of 1st element 


3.3.2 행렬의 곱(내적)
 A = np.arrary([1,2],[3,4])
 A.shape  # -> (2,2)
 B = np.arrary([5,6],[7,8])
  # -> array([19,22],


 A = np.arrary([1,2,3],[4,5,6])
 A.shape  # -> (2,3) 


 B = np.arrary([1,2],[3,4],[5,6])
 B.shape  # -> (3,2)
 np.dot(A,B)  # size of 1st-dimension of A == size of 0th-dimension of B
  # -> array([22,28],


3.3.3 신경망의 내적
 x1, x2 -> w1, ..., w6 -> y1, y2, y3

 X = np.array([1,2])  # X.shape -> (2,)
 W = np.array([1,3,5], [2,4,6]) # X.shape -> (2,3)
 Y = np.dot(X,W)  # -> [5,11,17]
3.3.4 3층 신경망 구현
 입력층(0층) 2개 + 1st 은닉층(1층) 3개 + 2nd 은닉층(2층) 2개 + 출력층(3층) 2개
                   H1(1x3) = X(1x2) x W1(2x3) -> H2(1x2) = H1(1x3) x W2(3x2) -> Y(1x2) = H2(1x2) x W3(2x2)


3.4.2 각 층의 신호 전달 구현
# Input layer -> 1st layer
 a1(1) = w(1)11*x1 + w(1)12*x2 + b(1)1, a2(1), a3(1)
 A(1) = X•W(1) + B(1)
   A(1) = (a(1)1 a(1)2 a(1)3), X = (x1 x2), B(1) = (b(1)1 b(1)2 b(1)3)
   W(1) = ( w(1)11 w(1)21 w(1)31
            w(1)12 w(1)22 w(1)32 ) 


 X = np.array([1.0, 0.5])
 W1 = np.array([0.1, 0.3, 0.5]. [0.2, 0.4, 0.6])
 B1 = np.array([0.1, 0.2, 0.3])
 A1 = np.dot(X,W1) + B1

 Z1 = sigmoid(A1)
# 1st layer (3ea) -> 2nd layer (2ea)
 W2 = np.array([0.1, 0.4], [0.2, 0.5], [0.3, 0.6])
 B2 = np.array([0.1, 0.2])

 A2 = np.dot(Z1,W2) + B2
 Z2 = sigmoid(A2)


# 2nd layer (2ea) -> Output layer (2ea)
 def identity_function(x):
     return x

 W3 = np.array([0.1, 0.3], [0.2, 0.4])
 B3 = np.array([0.1, 0.2])

 A3 = np.dot(Z2,W3) + B3
 Y = identify_function(A3)  # Y=A3


3.4.3 구현 정리
 def init_network():
     network = {}  # dictionary declaration
     network['W1'] = np.array([0.1, 0.3, 0.5]. [0.2, 0.4, 0.6])
     network['b1'] = np.array([0.1, 0.2, 0.3])
     network['W2'] = np.array([0.1, 0.4], [0.2, 0.5], [0.3, 0.6])
     network['b2'] = np.array([0.1, 0.2])
     network['W3'] = np.array([0.1, 0.3], [0.2, 0.4])
     network['b3'] = np.array([0.1, 0.2])

     return network

 def forward(network, x):
     W1, W2, W3 = network['W1'], network['W2'], network['W3']
     b1, b2, b3 = network['b1'], network['b2'], network['b3'] 


     a1 = np.dot(x,W1) + b1
     z1 = sigmoid(a1)

     a2 = np.dot(z1,W2) + b2
     z2 = sigmoid(a2)

     a3 = np.dot(z2,W3) + b3
     y = identify_function(a3)

     return y 


 network = init_network()
 x = np.array([1.0, 0.5])
 y = forward(network, x)


cf. Dictionary : dic= {'Key_Name':'Data'}
 => dic['Key_Name'] -> Data
 => dic.get['Key_Name'] -> Data
cf. dic['Key_Name_Added'] = 'Data_Added' ~ dic = {'Key_Name_Added':'Data_Added'}
cf. dic.keys(), dic.values(), dic.items()
cf. dic.clear()

3.5 출력층 설계
- 분류(classification) : 소프트맥스 함수 -> 구별, 인식
- 회귀(regression) : 항등 함수 -> 입력 데이터에서 (연속적인) 수치 예측


3.5.1 Softmax function Implementation
 yk = exp(ak)/Sum(exp(ai))  # i=1~n 


 def softmax(a):
     exp_a = np.exp(a)
     sum_exp_a = np.sum(exp_a)
     y = exp_a / sum_exp_a

     return y 


 a = np.arrary([0.3, 2.9, 4.0])
 y = softmax(a)

cf. Overflow prevention for Softmax function
 yk = exp(ak)/Sum(exp(ai)) => exp(ak+C)/Sum(exp(ai+C)) 


 def softmax(a):
     C = max(a)
     exp_a = np.exp(a - C)
     sum_exp_a = np.sum(exp_a - C)
     y = exp_a / sum_exp_a

     return y


3.5.3 Characteristics of Softmax function
- Output value range : 0~1
- Sum of outputs = 1
- Probability


cf. 학습(출력층에서 소프트맥스 함수 사용), 추론(출력층에서 소프트맥스 함수 생략)


3.5.4 출력층의 뉴런 수 정하기

 - 분류 : 분류하고 싶은 클래스 수


3.6 손글씨 숫자 인식
 cf. 가중치 매개변수 학습(w/ 학습 데이타) -> 추론
3.6.1 MNIST Data set ( 0~9 숫자 이미지, 28x28 크기의 회색조 이미지, 0~255/Pixel )
 - 훈련 이미지 : 60,000장
 - 시험 이미지 : 10,000장 


 - MNIST 데이타셋 변환 스크립트 ( @Git hub 저장소, dataset/mnist.py ) -> work directory : ch01, ch02, ..., ch08 중 하나 


 ## Display MNIST image
 import sys, os  # import modules  cf. module unit : '*.py' file
 sys.path.append(os.pardir)  # add search path
 import numpy as np
 from dataset.mnist import load_mnist  # import load_mnist function form 'mnist' module
 from PIL import Image  # PIL: Python Image Library module 


 def img_show(img):
     pil_img = Image.fromarray(np.uint8(img))  # transform the numpy data into PIL data object. uint8 for 8-bit pixel data


 (x_train, y_train), (x_test, y_test) = load_mnist(flatten=True, normalize=False)  # flatten : True -> 1-dimensional array
 print(x_train.shape)  # Training image (60000, 784)  cf. 784 = 28x28
 print(t_train.shape)  # Training label (60000, )
 print(x_test.shape)  # Test image (10000, 784)  cf. 784 = 28x28
 print(t_test.shape)  # Test label (10000, )
 img = x_train[0]
 label = t_train[0]
 print(label) # '5'

 print(img.shape)  # (784, ) - flattened 


 img = img.reshape(28,28)  # Restore 28x28 array
 print(img.shape)  # (28, 28) 




3.6.2 신경망의 추론 처리
- 입력층: 784(28x28) -> 출력층: 10(0~9)  via 1st 은닉층(50ea neuron), 2nd 은닉층(100ea neuron) 


 def get_data():
     (x_train, t_train), (x_test, t_test) =\
         load_mnist(normalize=True, flatten=True, one_hot_label=False)
     return x_test, t_test 


 def init_network():
     with open("sample_weight.pkl", 'rb') as f:
         network = pickle.load(f)  # Weight, Bias variable are saved as dictional variable form
     return network 


 def predict(network, x):
     W1, W2, W3 = network['W1'], network['W2'], network['W3']
     b1, b2, b3 = network['b1'], network['b2'], network['b3'] 


     a1 = np.dot(x, W1) + b1
     z1 = sigmoid(a1)
     a2 = np.dot(z1, W2) + b2
     z2 = sigmoid(a2)
     a3 = np.dot(z2, W3) + b3
     y = softmax(a3)  # 1 dimensional array with 10ea elements 


 x, t = get_data()  # x : Array elements for test images
 network = init_network
 accuracy_cnt = 0
 for i in range(len(x)):
     y = predict(network, x[i])
     p = np.argmax(y)  # get the index whose value is the greatest
     if p == t[i]:
         accuracy_cnt += 1

 print( "Accuracy:" +str( float(accuracy_cnt)/len(x) ) )



3.6.3 배치(batch) 처리
 x, t = get_data()
 network = init_network()
 W1, W2, W3 = network['W1'], network['W2'], network['W3'] 


 x.shape  # (10000, 784)
 x[0].shape  # (784,)
 W1.shape  # (7784, 50)
 W2.shape  # (50, 100)
 W3.shape  # (100, 10) 


// Shape of Weight, Input, Output for Batch

 X     W1        W2       W3       Y
 784   784x50    50x100   100x10   10 

=> for batch size = 100
 X            W1          W2             W3         Y
 100x784   784x50     50x100        100x10     10
              a1 100x50  a2 100x100  a3 100x10 


 x, t = get_data()
 network = init_network() 


 batch_size = 100
 accuracy_cnt = 0 


 for i in range(0, len(x), batch_size): 

     # range(start, end, step), i =0, bitch_size, 2*batch_size, ...
     x_batch = x[i:i+biatch_size]
     y_batch = predict(network, x_batch)
     p = np.argmax(y_batch, axis=1) 
       # axis usage ☞ http://gomguard.tistory.com/145
     accuracy_cnt += np.sum(p==t[i:i+batch_size]) 


 print( "Accuracy:" +str( float(accuracy_cnt)/len(x) ) ) 


cf. np.sum() function
 >>> y = np.array([1,2,1,0])
 >>> t = np.array([1,2,0,0])
 >>> print(y==t)
 [True, True, False, True]
 >>> np.sum(y==t)




