Auto Topic: activation

auto_activation | topic

Coverage Score
1
Mentioned Chunks
33
Mentioned Docs
1

Required Dimensions

definitionpros_cons

Covered Dimensions

definitionpros_cons

Keywords

activation

Relations

SourceTypeTargetW
Auto Topic: activationCO_OCCURSAuto Topic: relu14
Auto Topic: activationCO_OCCURSPropositional Logic6
Auto Topic: activationCO_OCCURSAuto Topic: convolutional5
Auto Topic: activationCO_OCCURSAuto Topic: kernel5
Auto Topic: activationCO_OCCURSAuto Topic: residual5
Auto Topic: activationCO_OCCURSState-Space Search4
Auto Topic: activationCO_OCCURSAuto Topic: convolution3
Auto Topic: activationCO_OCCURSAuto Topic: stride3
Auto Topic: activationCO_OCCURSAuto Topic: pixels3

Evidence Chunks

SourceConfidenceMentionsSnippet
textbook
Artificial-Intelligence-A-Modern-Approach-4th-Edition.pdf
0.593e the weight attached to the link from unit i to unit j; then we have a j = g j ( ∑i wi, jai ) ≡ g j(in j), where g j is a nonlinear activation function associated with unit j and in j is the weighted Activation function sum of the inputs to unit j. As in Section 19.6.3 (page 697 ...
textbook
Artificial-Intelligence-A-Modern-Approach-4th-Edition.pdf
0.572... -4 -2 0 2 4 6 softplus ReLU -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 -6 -4 -2 0 2 4 6 (a) (b) (c) Figure 22.2 Activation functions commonly used in deep learning systems: (a) the logistic or sigmoid function; (b) the ReLU function and the softplus function; (c) the tanh functi ...
textbook
Artificial-Intelligence-A-Modern-Approach-4th-Edition.pdf
0.572... unction with small weights. It is not straightforward to interpret the effect of weight decay in a neural network. In networks with sigmoid activation functions, it is hypothesized that weight decay helps to keep the activations near the linear part of the sigmoid, avoiding the fl ...
textbook
Artificial-Intelligence-A-Modern-Approach-4th-Edition.pdf
0.551... HEN rules. (In contrast, it is very difficult for humans to get an intuitive understanding of the result of a matrix multiply followed by an activation function, as is done in some neural network mod- els.) Second, the decision tree was in a sense constructed to be interpretable—t ...
textbook
Artificial-Intelligence-A-Modern-Approach-4th-Edition.pdf
0.551... d to interpret the model directly, the best we could come away with would be something like “after processing the convolutional layers, the activation for the dog output in the softmax layer was higher than any other class.” That’s not a very compelling argument. 18 This terminol ...
textbook
Artificial-Intelligence-A-Modern-Approach-4th-Edition.pdf
0.551... e for continuous functions, just as sufficiently large decision trees implement a lookup table for Boolean functions. A variety of different activation functions are used. The most common are the following: •The logistic or sigmoid function, which is also used in logistic regressi ...
textbook
Artificial-Intelligence-A-Modern-Approach-4th-Edition.pdf
0.551... hem are monotonically nonde- creasing, which means that their derivatives g′ are nonnegative. We will have more to say about the choice of activation function in later sections. Coupling multiple units together into a network creates a complex function that is a com- position of ...
textbook
Artificial-Intelligence-A-Modern-Approach-4th-Edition.pdf
0.551We will have more to say about the choice of activation function in later sections. Coupling multiple units together into a network creates a complex function that is a com- position of the algebraic expressions represented by the individual units. For example, the network shown ...
textbook
Artificial-Intelligence-A-Modern-Approach-4th-Edition.pdf
0.551... s in the first layer (w1,3, w1,4, etc.) and W(2) denotes the weights in the second layer ( w3,5 etc.). Finally, let g(1) and g(2) denote the activation functions in the first and second layers. Then the entire network can be written as follows: hw(x) = g(2)(W(2)g(1)(W(1)x)). (22.3) ...
textbook
Artificial-Intelligence-A-Modern-Approach-4th-Edition.pdf
0.551... major packages for deep learning provide automatic differentiation, so that users can experiment freely with different network structures, activation functions, loss func- tions, and forms of composition without having to do lots of calculus to derive a new learning algorithm fo ...
textbook
Artificial-Intelligence-A-Modern-Approach-4th-Edition.pdf
0.551etwork structures, activation functions, loss func- tions, and forms of composition without having to do lots of calculus to derive a new learning algorithm for each experiment. This has encouraged an approach called end-to-end learn- ing, in which a complex computational system ...
textbook
Artificial-Intelligence-A-Modern-Approach-4th-Edition.pdf
0.551... we need to be able to interpret the output of the network as a probability. For example, if the network has one output unit with a sigmoid activation function and is learning a Boolean classification, we can interpret the output value directly as the probability that the example ...
textbook
Artificial-Intelligence-A-Modern-Approach-4th-Edition.pdf
0.551... gression problem, where the target value y is continuous, it is common to use a linear output layer—in other words, ˆy j =in j, without any activation function g—and to inter- pret this as the mean of a Gaussian prediction with fixed variance. As we noted on page 780, maximizing l ...
textbook
Artificial-Intelligence-A-Modern-Approach-4th-Edition.pdf
0.551... han the output layers. For the first 25 years of research with multilayer networks (roughly 1985–2010), internal nodes used sigmoid and tanh activation functions almost exclusively. From around 2010 onwards, the ReLU and softplus become more popular, partly because they are believ ...
textbook
Artificial-Intelligence-A-Modern-Approach-4th-Edition.pdf
0.551... ride s =2. The peak response is centered on the darker (lower intensity) input pixel. The results would usually be fed through a nonlinear activation function (not shown) before going to the next hidden layer. dimensional image, and a vector kernelk of size l. (For simplicity we ...
textbook
Artificial-Intelligence-A-Modern-Approach-4th-Edition.pdf
0.551... roscience. In those models, the receptive field of a neuron is the portion of the sensory input that can Receptive field affect that neuron’s activation. In a CNN, the receptive field of a unit in the first hidden layer is small—just the size of the kernel, i.e., l pixels. In the dee ...
textbook
Artificial-Intelligence-A-Modern-Approach-4th-Edition.pdf
0.551... st like a convolution layer, with a kernel sizel and stride s, but the operation that is applied is fixed rather than learned. Typically, no activation function is associated with the pooling layer. There are two common forms of pooling: • Average-pooling computes the average valu ...
textbook
Artificial-Intelligence-A-Modern-Approach-4th-Edition.pdf
0.551... s achieved by the following equation for layer i in terms of layer i − 1: z(i) = g(i) r (z(i−1) + f (z(i−1))), (22.10) where gr denotes the activation functions for the residual layer. Here we think of f as the residual, perturbing the default behavior of passing layeri−1 through ...