Auto Topic: activation
auto_activation | topic
Coverage Score
1
Mentioned Chunks
33
Mentioned Docs
1
Required Dimensions
definitionpros_cons
Covered Dimensions
definitionpros_cons
Keywords
activation
Relations
| Source | Type | Target | W |
|---|---|---|---|
| Auto Topic: activation | CO_OCCURS | Auto Topic: relu | 14 |
| Auto Topic: activation | CO_OCCURS | Propositional Logic | 6 |
| Auto Topic: activation | CO_OCCURS | Auto Topic: convolutional | 5 |
| Auto Topic: activation | CO_OCCURS | Auto Topic: kernel | 5 |
| Auto Topic: activation | CO_OCCURS | Auto Topic: residual | 5 |
| Auto Topic: activation | CO_OCCURS | State-Space Search | 4 |
| Auto Topic: activation | CO_OCCURS | Auto Topic: convolution | 3 |
| Auto Topic: activation | CO_OCCURS | Auto Topic: stride | 3 |
| Auto Topic: activation | CO_OCCURS | Auto Topic: pixels | 3 |
Evidence Chunks
| Source | Confidence | Mentions | Snippet |
|---|---|---|---|
textbook Artificial-Intelligence-A-Modern-Approach-4th-Edition.pdf | 0.59 | 3 | e the weight attached to the link from unit i to unit j; then we have a j = g j ( ∑i wi, jai ) ≡ g j(in j), where g j is a nonlinear activation function associated with unit j and in j is the weighted Activation function sum of the inputs to unit j. As in Section 19.6.3 (page 697 ... |
textbook Artificial-Intelligence-A-Modern-Approach-4th-Edition.pdf | 0.57 | 2 | ... -4 -2 0 2 4 6 softplus ReLU -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 -6 -4 -2 0 2 4 6 (a) (b) (c) Figure 22.2 Activation functions commonly used in deep learning systems: (a) the logistic or sigmoid function; (b) the ReLU function and the softplus function; (c) the tanh functi ... |
textbook Artificial-Intelligence-A-Modern-Approach-4th-Edition.pdf | 0.57 | 2 | ... unction with small weights. It is not straightforward to interpret the effect of weight decay in a neural network. In networks with sigmoid activation functions, it is hypothesized that weight decay helps to keep the activations near the linear part of the sigmoid, avoiding the fl ... |
textbook Artificial-Intelligence-A-Modern-Approach-4th-Edition.pdf | 0.55 | 1 | ... HEN rules. (In contrast, it is very difficult for humans to get an intuitive understanding of the result of a matrix multiply followed by an activation function, as is done in some neural network mod- els.) Second, the decision tree was in a sense constructed to be interpretable—t ... |
textbook Artificial-Intelligence-A-Modern-Approach-4th-Edition.pdf | 0.55 | 1 | ... d to interpret the model directly, the best we could come away with would be something like “after processing the convolutional layers, the activation for the dog output in the softmax layer was higher than any other class.” That’s not a very compelling argument. 18 This terminol ... |
textbook Artificial-Intelligence-A-Modern-Approach-4th-Edition.pdf | 0.55 | 1 | ... e for continuous functions, just as sufficiently large decision trees implement a lookup table for Boolean functions. A variety of different activation functions are used. The most common are the following: •The logistic or sigmoid function, which is also used in logistic regressi ... |
textbook Artificial-Intelligence-A-Modern-Approach-4th-Edition.pdf | 0.55 | 1 | ... hem are monotonically nonde- creasing, which means that their derivatives g′ are nonnegative. We will have more to say about the choice of activation function in later sections. Coupling multiple units together into a network creates a complex function that is a com- position of ... |
textbook Artificial-Intelligence-A-Modern-Approach-4th-Edition.pdf | 0.55 | 1 | We will have more to say about the choice of activation function in later sections. Coupling multiple units together into a network creates a complex function that is a com- position of the algebraic expressions represented by the individual units. For example, the network shown ... |
textbook Artificial-Intelligence-A-Modern-Approach-4th-Edition.pdf | 0.55 | 1 | ... s in the first layer (w1,3, w1,4, etc.) and W(2) denotes the weights in the second layer ( w3,5 etc.). Finally, let g(1) and g(2) denote the activation functions in the first and second layers. Then the entire network can be written as follows: hw(x) = g(2)(W(2)g(1)(W(1)x)). (22.3) ... |
textbook Artificial-Intelligence-A-Modern-Approach-4th-Edition.pdf | 0.55 | 1 | ... major packages for deep learning provide automatic differentiation, so that users can experiment freely with different network structures, activation functions, loss func- tions, and forms of composition without having to do lots of calculus to derive a new learning algorithm fo ... |
textbook Artificial-Intelligence-A-Modern-Approach-4th-Edition.pdf | 0.55 | 1 | etwork structures, activation functions, loss func- tions, and forms of composition without having to do lots of calculus to derive a new learning algorithm for each experiment. This has encouraged an approach called end-to-end learn- ing, in which a complex computational system ... |
textbook Artificial-Intelligence-A-Modern-Approach-4th-Edition.pdf | 0.55 | 1 | ... we need to be able to interpret the output of the network as a probability. For example, if the network has one output unit with a sigmoid activation function and is learning a Boolean classification, we can interpret the output value directly as the probability that the example ... |
textbook Artificial-Intelligence-A-Modern-Approach-4th-Edition.pdf | 0.55 | 1 | ... gression problem, where the target value y is continuous, it is common to use a linear output layer—in other words, ˆy j =in j, without any activation function g—and to inter- pret this as the mean of a Gaussian prediction with fixed variance. As we noted on page 780, maximizing l ... |
textbook Artificial-Intelligence-A-Modern-Approach-4th-Edition.pdf | 0.55 | 1 | ... han the output layers. For the first 25 years of research with multilayer networks (roughly 1985–2010), internal nodes used sigmoid and tanh activation functions almost exclusively. From around 2010 onwards, the ReLU and softplus become more popular, partly because they are believ ... |
textbook Artificial-Intelligence-A-Modern-Approach-4th-Edition.pdf | 0.55 | 1 | ... ride s =2. The peak response is centered on the darker (lower intensity) input pixel. The results would usually be fed through a nonlinear activation function (not shown) before going to the next hidden layer. dimensional image, and a vector kernelk of size l. (For simplicity we ... |
textbook Artificial-Intelligence-A-Modern-Approach-4th-Edition.pdf | 0.55 | 1 | ... roscience. In those models, the receptive field of a neuron is the portion of the sensory input that can Receptive field affect that neuron’s activation. In a CNN, the receptive field of a unit in the first hidden layer is small—just the size of the kernel, i.e., l pixels. In the dee ... |
textbook Artificial-Intelligence-A-Modern-Approach-4th-Edition.pdf | 0.55 | 1 | ... st like a convolution layer, with a kernel sizel and stride s, but the operation that is applied is fixed rather than learned. Typically, no activation function is associated with the pooling layer. There are two common forms of pooling: • Average-pooling computes the average valu ... |
textbook Artificial-Intelligence-A-Modern-Approach-4th-Edition.pdf | 0.55 | 1 | ... s achieved by the following equation for layer i in terms of layer i − 1: z(i) = g(i) r (z(i−1) + f (z(i−1))), (22.10) where gr denotes the activation functions for the residual layer. Here we think of f as the residual, perturbing the default behavior of passing layeri−1 through ... |