DeepLearning from Scratch - Ch3. Neural Network

5 minute read

[review]

Perceptron

pros: variety of complex processes can be solved with perceptrons
cons: user have to choose parameters(weight, bias) him/herself.

Neural Network can solve this problem. It can automatically train proper parameters from data itself.

3.1 Perceptron to Neural Network

neural1

Perceptron and Neural Network looks similar.

The main difference is in activate function.
Perceptron uses step function as activate function.
Neural network uses sigmoid, ReLU, tanh etc.

3.2 Activation Function

\[y = \begin{cases} 0 \;\; (w_1x_1 + w_2x_2 + b<0)\\ 1 \;\; (w_1x_1 + w_2x_2 + b\geq0)\end{cases}\]

We can separate this perceptron’s eqation.

\[y = \;\; h(w_1x_1 + w_2x_2 + b)\\ h(x) = \; \begin{cases} 0 (x\leq0)\\ 1 (x>0)\end{cases}\]

activation function

$h(x)$ above is called activation function.
This convert sum of all the input signals to output signals.

3.2.1 Step Function

From the example above($h(x)$), Output is determined by threshold. This function is called step function.

excercise of float number

import numpy as np

# cf) np.array can be written in inequality but list can't be

a = np.array([1.2, 0.2, -0.1])
a > 0

array([ True,  True, False])

b = [1.2, 0.2, -0.1]
b > 0

---------------------------------------------------------------------------

TypeError                                 Traceback (most recent call last)

<ipython-input-3-2322d757a24f> in <module>
      1 b = [1.2, 0.2, -0.1]
----> 2 b > 0


TypeError: '>' not supported between instances of 'list' and 'int'

step function

def step_function(x):
    # cf) x only can get float type, not numpy array
    if x > 0:
        return 1
    else:
        return 0

print(step_function(1.2))
print(step_function(-0.7))

1
0

def step_function(x):
    y = x > 0
    # use astype method to convert numpy array
    return y.astype(np.int)

step_function(np.array([-1.0, 1.0, 2.0]))

array([0, 1, 1])

3.2.2 Step Function Graph

import matplotlib.pyplot as plt

def step_function(x):
    return np.array(x > 0, dtype=np.int)

x = np.arange(-5.0, 5.0, 0.1)
y = step_function(x)

plt.plot(x,y)
plt.ylim(-0.1, 1.1)
plt.show()

step_function

3.2.3 Sigmoid Function

def sigmoid(x):
    return 1/(1+np.exp(-x))

x = np.array([-1.0, 1.0, 2.0])
sigmoid(x)

array([0.26894142, 0.73105858, 0.88079708])

x = np.arange(-5, 5, 0.1)
y = sigmoid(x)

plt.plot(x,y)
plt.ylim(-0.1, 1.1)
plt.show()

sigmoid_function

3.2.4 Step Function VS Sigmoid Function

Both of them are nonlinear function. However, output differs.

The output of step function is discrete (0 or 1)
The output of sigmoid function is continuous (0 ~ 1)

3.2.5 Nonlinear Function

In neural network, we should not use linear function as a activation function.

The problem of linear function is here.

The combination of linear function is also linear. - As we saw at Chapter 2, we cannot solve complicated problems with one linear line.

The strength of neural network is in deep density of hidden layer. However if the activation function is linear, it will be useless to make deep hidden layers.

Ex) $\begin{cases} h(x) = cx\\ y(x) = h(h(h(x))) = c^3x\end{cases}$

3.2.6 ReLU(Rectified Linear Unit)

ReLU

\[h(x) =\begin{cases} x \;\;(x>0)\\ 0 \;\;(x\leq0)\end{cases}\]

def relu(x):
    return np.maximum(0,x)

# np.max([1,2,3,4]) = 4 : choose the maximum value in list/array
    # - one input data
# np.maximum([1,2,3], [4,5,6]) = [4,5,6] : choose the list/array that has maximum element
    # - two input data with same shape
    # - if two scalar given, choose maximum value

relu(3)

3.3 Ndarray

a = np.array([1,2,3])
a.shape
a.ndim

b = np.array([[1,2], [3,4], [5,6]])
b.shape

(3, 2)

There is no difference between row vector, and column vector.

They are treated same when multiplying

a = np.array([1,2])
b = np.array([[1,2], [3,4], [5,6]])
c = np.array([[1,2,3], [4,5,6]])

print(a.dot(c))
print(b.dot(a), end='\n\n')

print(a.T.dot(c))
print(b.dot(a.T))

3.4 Neural Network

neural2

$h(x)$ is activation function.
$\sigma()$ is activation function on the output layer.
$\sigma()$ differs
- here, we use identity function
- regression: identity function
- classification: softmax function

# output layer step function
def identity_function(x):
    return x

# dictionary of weight and bias
def init_network():
    network = {}
    network['W1'] = np.array([[0.1, 0.3, 0.5], [0.2, 0.4, 0.6]])
    network['b1'] = np.array([0.1, 0.2, 0.3])

    network['W2'] = np.array([[0.1, 0.4], [0.2, 0.5], [0.3, 0.6]])
    network['b2'] = np.array([0.1, 0.2])

    network['W3'] = np.array([[0.1, 0.3], [0.2, 0.4]])
    network['b3'] = np.array([0.1, 0.2])

    return network

# neural network
def forward(network, x):
    W1, W2, W3 = network['W1'], network['W2'], network['W3']
    b1, b2, b3 = network['b1'], network['b2'], network['b3']

    a1 = np.dot(x, W1) + b1
    z1 = sigmoid(a1)

    a2 = np.dot(z1, W2) + b2
    z2 = sigmoid(a2)

    a3 = np.dot(z2, W3) + b3
    y = identity_function(a3)

    return y

network = init_network()
x = np.array([1.0, 0.5])
y = forward(network, x)

print(y)

[0.31682708 0.69627909]

3.5 output layer : $\sigma()$

3.5.1 Softmax Function

softmax function

\[y_k = {exp(a_k)\over \sum_{i=1}^{n}{exp(a_i)}}\]

$n$: # of neurons in the output layer
$y_k$: k-th output

def softmax(a):
    exp_a = np.exp(a)
    sum_exp_a = np.sum(exp_a)
    y = exp_a / sum_exp_a

    return y

a = np.array([0.3, 2.9, 4.0])
print(softmax(a))

[0.01821127 0.24519181 0.73659691]

3.5.2 Features of Softmax Function

The output is continuous value between 0 and 1
Sum of all the values are 1
It can be interpreted as probability
- Neuron with highest result is selected as output class
Order relation is same as before softmax calculation
reason: $e^x$ is monotone increasing function
Because of calculation time efficiency, usually skip softmax process in field work.

3.5.3 Choosing output numbers.

In classification problem, the number of neurons is same as the number of classes.

3.6 Practice with MNIST Datasets

# Unlike the lecture, I used keras mnist datasets
from tensorflow.keras.datasets import mnist
# python image library
from PIL import Image

def img_show(img):
    pil_img = Image.fromarray(np.uint8(img))
    pil_img.show()

# ((train_img, train_label), (test_img, test_label))
((x_train, y_train), (x_test, y_test)) = mnist.load_data()

print(x_train.shape, x_test.shape)
print(y_train.shape, y_test.shape, end='\n\n')

# preprocessing mnist datasets : flatten
x_train = x_train.reshape(60000, 28 * 28)
x_test = x_test.reshape(10000, 28 * 28)

img = x_train[0]
label = y_train[0]
print(label, end='\n\n')

print(img.shape)
img = img.reshape(28,28)
print(img.shape)

img_show(img)

(60000, 28, 28) (10000, 28, 28)
(60000,) (10000,)

5

(784,)
(28, 28)

3.6.2 Neural Network Prediction

import pickle

# not normalized data
def get_data():
    ((x_train, y_train), (x_test, y_test)) = mnist.load_data()

    x_train = x_train.reshape(60000, 28 * 28)
    x_test = x_test.reshape(10000, 28 * 28)

    return x_test, y_test

# trained parameter dictionary
# pickle file on "asset -> files" directory
def init_network():
    with open('sample_weight.pkl', 'rb') as f:
        network = pickle.load(f)

    return network

def prediction(network, x):
    W1, W2, W3 = network['W1'], network['W2'], network['W3']
    b1, b2, b3 = network['b1'], network['b2'], network['b3']

    a1 = np.dot(x, W1) + b1
    z1 = sigmoid(a1)
    a2 = np.dot(z1, W2) + b2
    z2 = sigmoid(a2)
    a3 = np.dot(z2, W3) + b3
    y = softmax(a3)

    return y

x, y = get_data()
network = init_network()

accuracy_cnt = 0
for i in range(len(x)):
    y_pred = prediction(network, x[i])

    # get label with highest probability
    p = np.argmax(y_pred)

    if p == y[i]:
        accuracy_cnt += 1

print('Accuracy : ', float(accuracy_cnt)/len(x))

/opt/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:2: RuntimeWarning: overflow encountered in exp

Accuracy :  0.9207

3.6.3 batch

x, y = get_data()
network = init_network()

batch_size = 100
accuracy_cnt = 0

for i in range(0, len(x), batch_size):
    # 0~99 / 100~199 / ...
    x_batch = x[i:i+batch_size]
    y_batch = prediction(network, x_batch)

    p = np.argmax(y_batch, axis=1)

    accuracy_cnt += np.sum(p == y[i:i+batch_size])

print('Accuracy : ', float(accuracy_cnt)/len(x))

Accuracy :  0.9207

/opt/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:2: RuntimeWarning: overflow encountered in exp

Share on

Twitter Facebook LinkedIn

Philip Cho