연암과 다산 사이 :: Deep Learning from Scratch (3)

Deep Learning from Scratch (3) - Neural Network Learning

호모파베르 2018. 11. 7. 13:21

☞ http://ya-n-ds.tistory.com/3230 : Deep Learning from Scratch (1) - Python, Perceptron

☞ http://ya-n-ds.tistory.com/3231 : Deep Learning from Scratch (2) - Neural Network

# Reference : Deep Learning from Scratch ( 사이토 고키, 한빛미디어 )

Chap 4. 신경망 학습
cf. Perceptron : 선형분리 문제는 학습 가능

4.1.1 데이타 주도 학습
- 가능한한 사람의 개입 배제 : 패턴 인식에서 장점
- Feature 추출(변환기, 벡터 형식) -> 학습
- 기계학습 접근법
. 사람의 알고리즘(e.g. Perceptron) -> 결과
. 사람이 생각한 특징(SIFT, HOG 등) -> 기계학습(SVM, KNN 등) -> 결과
. 신경망(딥러닝) -> 결과 // End-to-end machine learning, 모든 문제를 같은 맥락에서 접근

4.1.2 훈련 데이터와 시험 데이터
- Training data for Optimized Parameters + Test data for Evaluation of the parameters
- Overfitting : Too much optimizsed for the specified Dataset

4.2 손실 함수 ( Loss function or Cost Function )
- Reference for the optimized parameters

4.2.1 Mean Squared Error(MSE)
E = Sum((y_k-t_k)^2)/2

e.g. y_k(Output for k'th image), t_k(Label for k'th image)
y = [0.1, 0.05, 0.6, 0.0, 0.05, 0.1, 0.0, 0.1, 0.1, 0.0, 0.0]
t = [0, 0, 1, 0, 0, 0, 0, 0, 0, 0] // One-hot encoding

def mean_squared_error(y, t):
return 0.5 * np.sum((y-t)**2)

4.2.2 Cross Entropy Error(CEE)
E = -sum(t_k*log(y_k))

def cross_entropy_error(y, t):
delta = 1e-7 # Prevention of log(0)
return -np.sum(t * np.log(y+delta))

4.2.3 미니 배치(mini-batch) 학습
E = -sum_n( sum_k(t_nk*log(y_nk)) )/N // for Mean Loss fucntion -> Independent on number of data

- Random selection of data for mini-batch
import sys, os
sys.path.append(os.pardir)
import numpy as np
from dataset.mnist import load_mnist # from dataset/mnist.

(x_train, t_train), (x_test, t_test) =\
load_mnist(normalized=True, on_hot_label=True)

print(x_train.shape) # (60000, 784)
print(t_train.shape) # (60000, 10)

train_size = x_train[0] # size of 1st dimension
batch_size = 10

batch_mask = np.random.choice(train_size, batch_size) # random 'batch_size' selection out of 'train_size' data
x_batch = x_train[batch_mask]
t_batch = t_train[batch_mask]

4.2.4 CEE for batch data
def cross_entropy_error(y, t):
     if y.ndim == 1:
         t = t.reshape(1, t.size) # reshape(t,(1,t_size)) ?
         y = y.reshape(1, y.size)

     delta = 1e-7 # Prevention of log(0)
     batch_size = y.shape[0]
     return -np.sum(t * np.log(y+delta)) / batch_size
     # return -np.sum( np.log(y[np.arange(batch_size), t] +delta) ) / batch_size
     # in case that 't' is a number label

4.2.5 Why Loss Funcntion?
- The optimized parameters(weight, bias) which make the Loss function value as small as possible
- Differentiation values are used for the optimization

- 매개변수에 대한 손실함수의 미분
If < 0 -> 매개변수 값 증가시켜 손실함수 감소
If > 0 -> 매개변수 값 감소시켜 손실함수 감소
If == 0 -> Optimization value

- '정확도' : 매개변수에 대한 미분값이 대부분 '0', 값 변화가 불연속 -> 신경망 학습을 할 수 없음

4.3 수치(numerical) 미분
def nemerical_diff(f, x):
h = 1e-4 # np.float32(1e-50) -> '0.0' rounding error
return (f(x+h) - f(x-h))/(2*h) # 중앙 차분 <-> (f(x+h)-f(x))/h
cf. dy/dx : analytic differentiation

4.3.2 수치 미문의 예
import numpy as np
import matplotlib.pylab as plt

def function_1(x):
return 0.01*x**2 + 0.1*x

def tangent_line(f, x):
　　d = numerical_diff(f, x)
　　print(d)
　　y = f(x) - d*x # y intercept
　　return lambda t: d*t + y # 'lambda' - similar to 'def'

x = np.arange(0.0, 20.0, 0.1)
y = function_1(x)

numerical_diff(function_1, 5)
numerical_diff(function_1, 10)

tf = tangent_line(function_1, 5)
y2 = tf(x)

tf = tangent_line(function_1, 10)
y3 = tf(x)

plt.xlabel("x")
plt.ylabel("f(x)")
plt.plot(x, y)
plt.plot(x, y2)
plt.plot(x, y3)
plt.show()

4.3.3 편미분
f(x0, x1) = x0^^2 + x1^^2

def function_2(x, y):
return x**2 + y**2 # same as np.sum(x**2)

grid = np.arange(-3.0, 3.0, 0.1)

x0, x1 = np.meshgrid(grid, grid)

z = function_2(x0, x1)

plt.surface(x0, x1, z)
plt.show()

fig = plt.figure()
ax = fig.gca(projection='3d') # Get Current Axes
ax.plot_surface(x0, x1, z)
plt.show()

// Partial differentiation of x0 with x0=3, x1=4
def function_tmp1(x0): # x1 = 4.0
return x0*x0 + 4.0**2.0

numerical_diff(function_tmp1, 3.0) # at x0 = 3.0

// Partial differentiation of x1 with x0=3, x1=4
def function_tmp2(x1): # x0 = 3.0
return 3.0**2 + x1*x1

numerical_diff(function_tmp2, 4.0) # at x0 = 3.0

4.4 기울기
- gradient : Vectors of the partial differentiation of all variables

def function_2(x):
     if x.ndim == 1:
         return np.sum(x**2)
     else:
         return np.sum(x**2, axis=1)

def numerical_gradient(f, x):
h = 1e-4
grad = np.zeros_like(x) # same-type array as 'x'

     for idx in range(x.size):
         tmp_val = x[idx] # (x0, x1)
         x[idx] = tmp_val + h # (x0+h, x1+h)
         fxh1 = f(x) # ( ((x0+h)*(x0+h) + x1*x1), (x0*x0 + (x1+h)*(x1+h)) )

x[idx] = tmp_val - h # (x0-h, x1-h)
fxh2 = f(x) # ( ((x0-h)*(x0-h) + x1*x1), (x0*x0 + (x1-h)*(x1-h)) )

grad[idx] = (fxh1 - fxh2)/(2*h) # ( df/dx0, df/dx1 )
x[idx] = tmp_val

return grad

numerical_gradient(function_2, np.array([3.0, 4.0]) # array([6,8])

x0 = np.arange(-2, 2.5, 0.25)
x1 = np.arange(-2, 2.5, 0.25)
X, Y = np.meshgrid(x0, x1)

X = X.flatten()
Y = Y.flatten()

Z = zip(X, Y)

grad = numerical_gradient(function_2, np.array([X, Y])) # np.arrary([X,Y])== Z

plt.figure()
plt.quiver(X, Y, -grad[0], -grad[1], angles="xy",color="#666666")
plt.xlim([-2, 2])
plt.ylim([-2, 2])
plt.xlabel('x0')
plt.ylabel('x1')
plt.grid()
plt.legend()
plt.draw()
plt.show()

4.4.1 경사법
- Optimal parameters when Loss function is the minimum value
cf. Gradient = 0, where the minimum/maximum value(최소/최대값) or local minimum/maximum value(극소/극대값) or saddle point(안장점) lies
cf. plateau (고원) - 학습 진행되지 않음

- 학습률(learning rate) : Too big(발산), Too small(Too much iteration number is required)
x0 = x0 - η(df/dx0)
x1 = x1 - η(df/dx1)

def gradient_descent(f, init_x, lr=0.01, step_num=100):
x = init_x

     for i in range(step_num):
         grad = numerical_gradient(f, x)
         x -= lr*grad
     return x

def function_2(x):
return x[0]**2 + x[1]**2

init_x = np.array([-3.0, 4.0])
gradient_descent(function_2, init_x=init_x, lr=0.01, step_num=100)

4.4.2 신경망에서의 기울기
e.g. 가중치 W(shape 2x3), 손실함수 L -> dL/dW
dL/dw11, dL/dw12, dL/dw13
dL/dw21, dL/dw22, dL/dw23

import sys, os
sys.path.append(os.pardir)
import numpy as np
from common.functions import softmax, cross_entropy_error
from common.gradient functions import softmax, cross_entropy_error

class simpleNet:
def __init__(self):
self.W = np.random.randn(2,3) # Standard Normal Distribution ( mean=0, std=1 )

def predict(self, x):
return np.dot(x, self.W) # x : 1x2 or Nx2(batch)

     def loss(self, x, t):
         z = self.predict(x)
         y = softmax(z) # y_k = exp(a_k)/sum(exp(a_i)) -> prevention of Overflow
         loss = cross_entropy_error(y, t) # E = -sum(t_k*log(y_k))
         return loss

net = simpleNet()
print(net.W)

x = np.array([0.6, 0.9])
p = net.predict(x)

np.argmax(p) # Index of Max element
t = np.array([0, 0, 1]) # Label of correct value

net.loss(x,t)

def f(W):
return net.loss(x,t)
# f = lambda w: net.loss(x,t)

dW = numerical_gradient(f, net.W)
print(dW)

------

저작자표시 비영리 변경금지

Posted by 명랑만화

AND

연암과 다산 사이

TAG CLOUD

Deep Learning from Scratch (3) - Neural Network Learning

ARTICLE CATEGORY

RECENT ARTICLE

RECENT COMMENT

RECENT TRACKBACK

CALENDAR

ARCHIVE

LINK

티스토리툴바