<script type="application/ld+json">
{
"@context": "https://schema.org",
"@type": "FAQPage",
"mainEntity": [{
"@type": "Question",
"name": "What is an activation function?",
"text": "In an Artificial Neural Network (ANN), the activation function is the feature that decides whether a neuron should be activated or not. It defines the output of a node for an input or a set of inputs.."
}
},{
"@type": "Question",
"name": "What are the types of activation functions?",
"text": "1. Binary Step Function.
2. Linear Activation Function.
3. Sigmoid Activation Function.
4. Tanh Function.
5. ReLU Function.
6. Leaky ReLU.
7. Parameterised ReLU (PReLU).
8. Exponential Linear Unit (ELU).
9. Swish.
10. Softmax Function."
}
}]
}
</script>

# Activation Function

## What is an activation function?

In an Artificial Neural Network (ANN), the activation function is the feature that decides whether a neuron should be activated or not. It defines the output of a node for an input or a set of inputs.

Activation functions are used to introduce non-linear properties to neural networks.

## What are the types of activation functions?

There are various types of activation functions. Here are some of the widely used ones:

### 1. Binary Step Function

The binary step function is also known as the ‘Threshold Function’. It is essentially a threshold-based classifier.

When using the binary step function, if the input to the activation function is higher than the set threshold, then the neuron will be activated. If the input is lower than the threshold, then the neuron is deactivated.

This activation function can only be used for binary class problems. However, they can be tweaked to be applied to multi-class problems.

Another limitation is that the gradient(differential ) of the binary step function is zero, which hinders backpropagation.

### 2. Linear Activation Function

Linear functions are also known as straight-line functions. Here, the output is proportional to the weighted sum input. It’s function can be represented with this equation:

f(x) = ax + c

A major problem with this function is that the output of differentiation is constant and does not have any relation to the input. During the backpropagation process, weights and bias will get updated, but the gradient would not change.

Another issue is that irrespective of the number of layers in the neural network, the last layer will always be a linear function of the first layer.

### 3. Sigmoid Activation Function

These activation functions use a real value as an input and generates another value between 0 and 1 as the output. It translates inputs from the range in (-∞,∞) to the range in (0,1).

The sigmoid or logistic activation function is used widely in classification problems.

The derivative of a sigmoid function will lie between 0 and 0.25. It is not monotonic. It even faces the ‘vanishing gradient and exploding gradient problem’.

### 4. Tanh Function

This is another non-linear activation function. The derivative of a Tanh function can be expressed in terms of the function itself.

A TanH function generates output values between -1 and 1.

### 5. ReLU Function

Even though ReLU stands for Rectified Linear Unit, these functions are not linear. They hold a significant advantage in the fact that they do not activate all the neurons at the same time.  Neurons are deactivated only if the output of the linear transformation is less than 0.

The formula is: max(0,z)

### 6. Leaky ReLU

This is an enhanced version of the ReLU function. These functions try to solve the “dying ReLU” problem.

While a ReLU is 0 when z<0, Leaky ReLUs permit a tiny, non-zero, constant gradient α (usually, α=0.01).

### 7. Parameterised ReLU (PReLU)

Parameterised ReLU add a new parameter as a slope in the negative area of the function. These functions allow the neurons to choose which slope is best in the negative region.

Parameterised ReLUs can turn into ReLUs or Leaky ReLUs with certain values of α.

### 8. Exponential Linear Unit (ELU)

These functions converge at a quicker pace and tend to produce results with greater levels of accuracy.

They have an additional alpha constant which is a positive number.

### 9. Swish

This activation function was discovered by researchers at Google. They are as computationally efficient as ReLU functions, but tend to perform better on deeper models than ReLU can.

### 10. Softmax Function

The Softmax function is usually referred to as a combination of multiple sigmoid functions. They are used to solved multi-class classification problems. They calculate the probability distribution of the event over ‘n’ different events.

These probabilities will help in figuring out the target class for the inputs.

Engati powers 45,000+ chatbot & live chat solutions in 50+ languages across the world.

We aim to empower you to create the best customer experiences you could imagine.

So, are you ready to create unbelievably smooth experiences?

# Activation Function

October 14, 2020

Key takeawaysCollaboration platforms are essential to the new way of workingEmployees prefer engati over emailEmployees play a growing part in software purchasing decisionsThe future of work is collaborativeMethodology

## What is an activation function?

In an Artificial Neural Network (ANN), the activation function is the feature that decides whether a neuron should be activated or not. It defines the output of a node for an input or a set of inputs.

Activation functions are used to introduce non-linear properties to neural networks.

## What are the types of activation functions?

There are various types of activation functions. Here are some of the widely used ones:

### 1. Binary Step Function

The binary step function is also known as the ‘Threshold Function’. It is essentially a threshold-based classifier.

When using the binary step function, if the input to the activation function is higher than the set threshold, then the neuron will be activated. If the input is lower than the threshold, then the neuron is deactivated.

This activation function can only be used for binary class problems. However, they can be tweaked to be applied to multi-class problems.

Another limitation is that the gradient(differential ) of the binary step function is zero, which hinders backpropagation.

### 2. Linear Activation Function

Linear functions are also known as straight-line functions. Here, the output is proportional to the weighted sum input. It’s function can be represented with this equation:

f(x) = ax + c

A major problem with this function is that the output of differentiation is constant and does not have any relation to the input. During the backpropagation process, weights and bias will get updated, but the gradient would not change.

Another issue is that irrespective of the number of layers in the neural network, the last layer will always be a linear function of the first layer.

### 3. Sigmoid Activation Function

These activation functions use a real value as an input and generates another value between 0 and 1 as the output. It translates inputs from the range in (-∞,∞) to the range in (0,1).

The sigmoid or logistic activation function is used widely in classification problems.

The derivative of a sigmoid function will lie between 0 and 0.25. It is not monotonic. It even faces the ‘vanishing gradient and exploding gradient problem’.

### 4. Tanh Function

This is another non-linear activation function. The derivative of a Tanh function can be expressed in terms of the function itself.

A TanH function generates output values between -1 and 1.

### 5. ReLU Function

Even though ReLU stands for Rectified Linear Unit, these functions are not linear. They hold a significant advantage in the fact that they do not activate all the neurons at the same time.  Neurons are deactivated only if the output of the linear transformation is less than 0.

The formula is: max(0,z)

### 6. Leaky ReLU

This is an enhanced version of the ReLU function. These functions try to solve the “dying ReLU” problem.

While a ReLU is 0 when z<0, Leaky ReLUs permit a tiny, non-zero, constant gradient α (usually, α=0.01).

### 7. Parameterised ReLU (PReLU)

Parameterised ReLU add a new parameter as a slope in the negative area of the function. These functions allow the neurons to choose which slope is best in the negative region.

Parameterised ReLUs can turn into ReLUs or Leaky ReLUs with certain values of α.

### 8. Exponential Linear Unit (ELU)

These functions converge at a quicker pace and tend to produce results with greater levels of accuracy.

They have an additional alpha constant which is a positive number.

### 9. Swish

This activation function was discovered by researchers at Google. They are as computationally efficient as ReLU functions, but tend to perform better on deeper models than ReLU can.

### 10. Softmax Function

The Softmax function is usually referred to as a combination of multiple sigmoid functions. They are used to solved multi-class classification problems. They calculate the probability distribution of the event over ‘n’ different events.

These probabilities will help in figuring out the target class for the inputs.