Auto Topic: activation

auto_activation | topic

Coverage Score

1

Mentioned Chunks

33

Mentioned Docs

1

Required Dimensions

definitionpros_cons

Covered Dimensions

definitionpros_cons

Keywords

activation

Relations

Source	Type	Target	W
Auto Topic: activation	CO_OCCURS	Auto Topic: relu	14
Auto Topic: activation	CO_OCCURS	Propositional Logic	6
Auto Topic: activation	CO_OCCURS	Auto Topic: convolutional	5
Auto Topic: activation	CO_OCCURS	Auto Topic: kernel	5
Auto Topic: activation	CO_OCCURS	Auto Topic: residual	5
Auto Topic: activation	CO_OCCURS	State-Space Search	4
Auto Topic: activation	CO_OCCURS	Auto Topic: convolution	3
Auto Topic: activation	CO_OCCURS	Auto Topic: stride	3
Auto Topic: activation	CO_OCCURS	Auto Topic: pixels	3

Evidence Chunks

Source	Confidence	Mentions	Snippet
textbook Artificial-Intelligence-A-Modern-Approach-4th-Edition.pdf	0.59	3	e the weight attached to the link from unit i to unit j; then we have a j = g j ( ∑i wi, jai ) ≡ g j(in j), where g j is a nonlinear activation function associated with unit j and in j is the weighted Activation function sum of the inputs to unit j. As in Section 19.6.3 (page 697 ...
textbook Artificial-Intelligence-A-Modern-Approach-4th-Edition.pdf	0.57	2	... -4 -2 0 2 4 6 softplus ReLU -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 -6 -4 -2 0 2 4 6 (a) (b) (c) Figure 22.2 Activation functions commonly used in deep learning systems: (a) the logistic or sigmoid function; (b) the ReLU function and the softplus function; (c) the tanh functi ...
textbook Artificial-Intelligence-A-Modern-Approach-4th-Edition.pdf	0.57	2	... unction with small weights. It is not straightforward to interpret the effect of weight decay in a neural network. In networks with sigmoid activation functions, it is hypothesized that weight decay helps to keep the activations near the linear part of the sigmoid, avoiding the ﬂ ...
textbook Artificial-Intelligence-A-Modern-Approach-4th-Edition.pdf	0.55	1	... HEN rules. (In contrast, it is very difﬁcult for humans to get an intuitive understanding of the result of a matrix multiply followed by an activation function, as is done in some neural network mod- els.) Second, the decision tree was in a sense constructed to be interpretable—t ...
textbook Artificial-Intelligence-A-Modern-Approach-4th-Edition.pdf	0.55	1	... d to interpret the model directly, the best we could come away with would be something like “after processing the convolutional layers, the activation for the dog output in the softmax layer was higher than any other class.” That’s not a very compelling argument. 18 This terminol ...
textbook Artificial-Intelligence-A-Modern-Approach-4th-Edition.pdf	0.55	1	... e for continuous functions, just as sufﬁciently large decision trees implement a lookup table for Boolean functions. A variety of different activation functions are used. The most common are the following: •The logistic or sigmoid function, which is also used in logistic regressi ...
textbook Artificial-Intelligence-A-Modern-Approach-4th-Edition.pdf	0.55	1	... hem are monotonically nonde- creasing, which means that their derivatives g′ are nonnegative. We will have more to say about the choice of activation function in later sections. Coupling multiple units together into a network creates a complex function that is a com- position of ...
textbook Artificial-Intelligence-A-Modern-Approach-4th-Edition.pdf	0.55	1	We will have more to say about the choice of activation function in later sections. Coupling multiple units together into a network creates a complex function that is a com- position of the algebraic expressions represented by the individual units. For example, the network shown ...
textbook Artificial-Intelligence-A-Modern-Approach-4th-Edition.pdf	0.55	1	... s in the ﬁrst layer (w1,3, w1,4, etc.) and W(2) denotes the weights in the second layer ( w3,5 etc.). Finally, let g(1) and g(2) denote the activation functions in the ﬁrst and second layers. Then the entire network can be written as follows: hw(x) = g(2)(W(2)g(1)(W(1)x)). (22.3) ...
textbook Artificial-Intelligence-A-Modern-Approach-4th-Edition.pdf	0.55	1	... major packages for deep learning provide automatic differentiation, so that users can experiment freely with different network structures, activation functions, loss func- tions, and forms of composition without having to do lots of calculus to derive a new learning algorithm fo ...
textbook Artificial-Intelligence-A-Modern-Approach-4th-Edition.pdf	0.55	1	etwork structures, activation functions, loss func- tions, and forms of composition without having to do lots of calculus to derive a new learning algorithm for each experiment. This has encouraged an approach called end-to-end learn- ing, in which a complex computational system ...
textbook Artificial-Intelligence-A-Modern-Approach-4th-Edition.pdf	0.55	1	... we need to be able to interpret the output of the network as a probability. For example, if the network has one output unit with a sigmoid activation function and is learning a Boolean classiﬁcation, we can interpret the output value directly as the probability that the example ...
textbook Artificial-Intelligence-A-Modern-Approach-4th-Edition.pdf	0.55	1	... gression problem, where the target value y is continuous, it is common to use a linear output layer—in other words, ˆy j =in j, without any activation function g—and to inter- pret this as the mean of a Gaussian prediction with ﬁxed variance. As we noted on page 780, maximizing l ...
textbook Artificial-Intelligence-A-Modern-Approach-4th-Edition.pdf	0.55	1	... han the output layers. For the ﬁrst 25 years of research with multilayer networks (roughly 1985–2010), internal nodes used sigmoid and tanh activation functions almost exclusively. From around 2010 onwards, the ReLU and softplus become more popular, partly because they are believ ...
textbook Artificial-Intelligence-A-Modern-Approach-4th-Edition.pdf	0.55	1	... ride s =2. The peak response is centered on the darker (lower intensity) input pixel. The results would usually be fed through a nonlinear activation function (not shown) before going to the next hidden layer. dimensional image, and a vector kernelk of size l. (For simplicity we ...
textbook Artificial-Intelligence-A-Modern-Approach-4th-Edition.pdf	0.55	1	... roscience. In those models, the receptive ﬁeld of a neuron is the portion of the sensory input that can Receptive ﬁeld affect that neuron’s activation. In a CNN, the receptive ﬁeld of a unit in the ﬁrst hidden layer is small—just the size of the kernel, i.e., l pixels. In the dee ...
textbook Artificial-Intelligence-A-Modern-Approach-4th-Edition.pdf	0.55	1	... st like a convolution layer, with a kernel sizel and stride s, but the operation that is applied is ﬁxed rather than learned. Typically, no activation function is associated with the pooling layer. There are two common forms of pooling: • Average-pooling computes the average valu ...
textbook Artificial-Intelligence-A-Modern-Approach-4th-Edition.pdf	0.55	1	... s achieved by the following equation for layer i in terms of layer i − 1: z(i) = g(i) r (z(i−1) + f (z(i−1))), (22.10) where gr denotes the activation functions for the residual layer. Here we think of f as the residual, perturbing the default behavior of passing layeri−1 through ...