Auto Topic: minibatch

auto_minibatch | topic

Coverage Score

1

Mentioned Chunks

11

Mentioned Docs

1

Required Dimensions

definitionpros_cons

Covered Dimensions

definitionpros_cons

Keywords

minibatch

Relations

Source	Type	Target	W
Auto Topic: minibatch	CO_OCCURS	Propositional Logic	3

Evidence Chunks

Source	Confidence	Mentions	Snippet
textbook Artificial-Intelligence-A-Modern-Approach-4th-Edition.pdf	0.61	4	... ing to Equation (19.5). The original version of SGD selected only one training example for each step, but it is now more common to select a minibatch of m out of the N examples. Suppose we have N = 10,000 Minibatch examples and choose a minibatch of size m = 100. Then on each ste ...
textbook Artificial-Intelligence-A-Modern-Approach-4th-Edition.pdf	0.59	3	... rate. For standard gradient descent, the loss L is deﬁned with respect to the entire training set. For SGD, it is deﬁned with respect to a minibatch of m examples chosen randomly at each step. As noted in Section 4.2, the literature on optimization methods for high-dimensional c ...
textbook Artificial-Intelligence-A-Modern-Approach-4th-Edition.pdf	0.59	3	... elps the algo- rithm escape small local minima in the high-dimensional weight space (as in simulated annealing—see page 132); and the small minibatch size ensures that the computational cost of each weight update step is a small constant, independent of the training set size. • B ...
textbook Artificial-Intelligence-A-Modern-Approach-4th-Edition.pdf	0.59	3	... nvergence of Batch normalization SGD by rescaling the values generated at the internal layers of the network from the examples within each minibatch. Although the reasons for its effectiveness are not well understood at the time of writing, we include it because it confers signiﬁ ...
textbook Artificial-Intelligence-A-Modern-Approach-4th-Edition.pdf	0.59	3	... g a large ensemble of different networks (see Section 19.8). More speciﬁcally, let us suppose we are using stochastic gradient descent with minibatch size m. For each minibatch, the dropout algorithm applies the following process to every node in the network: with probability p, ...
textbook Artificial-Intelligence-A-Modern-Approach-4th-Edition.pdf	0.57	2	... in Section 22.4, this also aligns nicely with the way that the stochastic gradient descent algorithm calculates gradients with respect to a minibatch of training examples. Let us put all this together in the form of an example. Suppose we are training on 256 ×256 RGB images with ...
textbook Artificial-Intelligence-A-Modern-Approach-4th-Edition.pdf	0.57	2	... ients proceeds from right to left. may point in entirely the wrong direction, making convergence difﬁcult. One solution is to increase the minibatch size as training proceeds; another is to incorporate the idea of momentum, which keeps a running average of the gradients of past m ...
textbook Artificial-Intelligence-A-Modern-Approach-4th-Edition.pdf	0.57	2	... a large ensemble of different networks (see Section 19.8). More speciﬁcally, let us suppose we are using stochastic gradient descent with minibatch size m. For each minibatch, the dropout
textbook Artificial-Intelligence-A-Modern-Approach-4th-Edition.pdf	0.55	1	... ingle example. Within these constraints, we would treatm as a hyperparameter that should be tuned for each learning problem. Convergence of minibatch SGD is not strictly guaranteed; it can oscillate around the minimum without settling down. We will see on page 702 how a schedule ...
textbook Artificial-Intelligence-A-Modern-Approach-4th-Edition.pdf	0.55	1	n 256 ×256 RGB images with a minibatch size of 64. The input in this case will be a four- 5 The proper mathematical deﬁnition of tensors requires that certain invariances hold under a change of basis. Section 22.3 Convolutional Networks 815 dimensional tensor of size 256 ×256 ×3 ...
textbook Artificial-Intelligence-A-Modern-Approach-4th-Edition.pdf	0.55	1	... min-conﬂicts heuristic, 181, 190 MIN-VALUE , 196, 200 Minami, R., 1064, 1117 mind dualistic view, 1057 as physical system, 24 theory of, 20 minibatch, 697 minimal model, 352 MINIMAX -S EARCH , 196 minimax algorithm, 195–196, 220, 221, 601 minimax decision, 195 minimax search, 194 ...