Gradient: Difference between revisions
Jump to navigation
Jump to search
imported>Comp.arch mNo edit summary |
→Gradient is direction of steepest ascent: Absolute value of a vector is ill-defined. This should be the norm. Someone should check that the rest of the article uses norms correctly. |
||
| Line 1: | Line 1: | ||
{{Short description|Multivariate derivative (mathematics)}} | {{Short description|Multivariate derivative (mathematics)}} | ||
{{about|a generalized derivative of a multivariate function|another use in mathematics|Slope|a similarly spelled unit of angle|Gradian|other uses}} | {{about|a generalized derivative of a multivariate function|another use in mathematics|Slope|a similarly spelled unit of angle|Gradian|gradients in color science|Color gradient|other uses}} | ||
{{more citations needed|date=January 2018}} | {{more citations needed|date=January 2018}} | ||
[[File:Gradient2.svg|thumb|300px|The gradient, represented by the blue arrows, denotes the direction of greatest change of a scalar function. The values of the function are represented in greyscale and increase in value from white (low) to dark (high).]] | |||
{{Calculus|Vector}} | |||
In [[vector calculus]], the '''gradient''' of a [[scalar-valued function|scalar-valued]] [[differentiable function]] <math>f</math> of | In [[vector calculus]], the '''gradient''' of a [[scalar-valued function|scalar-valued]] [[differentiable function]] <math>f</math> of several variables is the [[vector field]] (or [[vector-valued function]]) <math>\nabla f</math> whose value at a point <math>p</math> gives the direction and the rate of fastest increase. The gradient transforms like a vector under change of basis of the space of variables of <math>f</math>. If the gradient of a function is non-zero at a point <math>p</math>, the direction of the gradient is the direction in which the function increases most quickly from <math>p</math>, and the [[magnitude (mathematics)|magnitude]] of the gradient is the rate of increase in that direction, the greatest [[absolute value|absolute]] [[directional derivative]].<ref> | ||
*{{harvtxt|Bachman|2007|p=77}} | *{{harvtxt|Bachman|2007|p=77}} | ||
*{{harvtxt|Downing|2010|pp=316–317}} | *{{harvtxt|Downing|2010|pp=316–317}} | ||
| Line 11: | Line 12: | ||
*{{harvtxt|Moise|1967|p=684}} | *{{harvtxt|Moise|1967|p=684}} | ||
*{{harvtxt|Protter|Morrey|1970|p=715}} | *{{harvtxt|Protter|Morrey|1970|p=715}} | ||
*{{harvtxt|Swokowski et al.|1994|pp=1036,1038–1039}}</ref> Further, a point where the gradient is the zero vector is known as a [[stationary point]]. The gradient thus plays a fundamental role in [[optimization theory]], where it is used to minimize a function by [[gradient descent]]. In coordinate-free terms, the gradient of a function <math>f(\mathbf{r})</math> may be defined by: | *{{harvtxt|Swokowski et al.|1994|pp=1036,1038–1039}}</ref> Further, a point where the gradient is the [[zero vector]] is known as a [[stationary point]]. The gradient thus plays a fundamental role in [[optimization theory]], [[machine learning]], and [[artificial intelligence]], where it is used to minimize a function by [[gradient descent]]. In coordinate-free terms, the gradient of a function <math>f(\mathbf{r})</math> may be defined by: | ||
<math display="block">df=\nabla f \cdot d\mathbf{r}</math> | <math display="block">df=\nabla f \cdot d\mathbf{r}</math> | ||
| Line 32: | Line 33: | ||
\vdots \\ | \vdots \\ | ||
\frac{\partial f}{\partial x_n}(p) | \frac{\partial f}{\partial x_n}(p) | ||
\end{bmatrix} | \end{bmatrix}</math>. | ||
Note that the above definition for gradient is defined for the function <math>f</math> only if <math>f</math> is differentiable at <math>p</math>. There can be functions for which partial derivatives exist in every direction but fail to be differentiable. Furthermore, this definition as the vector of partial derivatives is only valid when the basis of the coordinate system is [[Orthonormal basis|orthonormal]]. For any other basis, the [[metric tensor]] at that point needs to be taken into account. | Note that the above definition for gradient is defined for the function <math>f</math> only if <math>f</math> is differentiable at <math>p</math>. There can be functions for which partial derivatives exist in every direction but fail to be differentiable. Furthermore, this definition as the vector of partial derivatives is only valid when the basis of the coordinate system is [[Orthonormal basis|orthonormal]]. For any other basis, the [[metric tensor]] at that point needs to be taken into account. | ||
| Line 65: | Line 66: | ||
The gradient (or gradient vector field) of a scalar function {{math|''f''(''x''<sub>1</sub>, ''x''<sub>2</sub>, ''x''<sub>3</sub>, …, ''x<sub>n</sub>'')}} is denoted {{math|∇''f''}} or {{math|{{vec|∇}}''f''}} where {{math|∇}} ([[nabla symbol|nabla]]) denotes the vector [[differential operator]], [[del]]. The notation {{math|grad ''f''}} is also commonly used to represent the gradient. The gradient of {{math|''f''}} is defined as the unique vector field whose dot product with any [[Euclidean vector|vector]] {{math|'''v'''}} at each point {{math|''x''}} is the directional derivative of {{math|''f''}} along {{math|'''v'''}}. That is, | The gradient (or gradient vector field) of a scalar function {{math|''f''(''x''<sub>1</sub>, ''x''<sub>2</sub>, ''x''<sub>3</sub>, …, ''x<sub>n</sub>'')}} is denoted {{math|∇''f''}} or {{math|{{vec|∇}}''f''}} where {{math|∇}} ([[nabla symbol|nabla]]) denotes the vector [[differential operator]], [[del]]. The notation {{math|grad ''f''}} is also commonly used to represent the gradient. The gradient of {{math|''f''}} is defined as the unique vector field whose dot product with any [[Euclidean vector|vector]] {{math|'''v'''}} at each point {{math|''x''}} is the directional derivative of {{math|''f''}} along {{math|'''v'''}}. That is, | ||
<math display="block">\big(\nabla f(x)\big)\cdot \mathbf{v} = D_{\mathbf v}f(x)</math> | <math display="block">\big(\nabla f(x)\big)\cdot \hat{\mathbf{v}} = D_{\mathbf v}f(x)</math> | ||
where the right-hand side is the [[directional derivative]] and there are many ways to represent it. Formally, the derivative is ''dual'' to the gradient; see [[#Derivative|relationship with derivative]]. | where the right-hand side is the [[directional derivative]] and there are many ways to represent it. Formally, the derivative is ''dual'' to the gradient; see [[#Derivative|relationship with derivative]]. | ||
| Line 78: | Line 79: | ||
<math display="block">\nabla f = \frac{\partial f}{\partial x} \mathbf{i} + \frac{\partial f}{\partial y} \mathbf{j} + \frac{\partial f}{\partial z} \mathbf{k},</math> | <math display="block">\nabla f = \frac{\partial f}{\partial x} \mathbf{i} + \frac{\partial f}{\partial y} \mathbf{j} + \frac{\partial f}{\partial z} \mathbf{k},</math> | ||
where {{math|'''i'''}}, {{math|'''j'''}}, {{math|'''k'''}} are the [[standard basis|standard]] unit vectors in the directions of the {{math|''x''}}, {{math|''y''}} and {{math|''z''}} coordinates, respectively. For example, the gradient of the function | where {{math|'''i'''}}, {{math|'''j'''}}, {{math|'''k'''}} are the [[standard basis|standard]] unit vectors in the directions of the {{math|''x''}}, {{math|''y''}} and {{math|''z''}} coordinates, respectively. | ||
For example, the gradient of the function | |||
<math display="block">f(x,y,z)= 2x+3y^2-\sin(z)</math> | <math display="block">f(x,y,z)= 2x+3y^2-\sin(z)</math> | ||
is | is | ||
| Line 96: | Line 99: | ||
{{main|Del in cylindrical and spherical coordinates}} | {{main|Del in cylindrical and spherical coordinates}} | ||
In [[cylindrical coordinate system | In [[cylindrical coordinate system|cylindrical coordinates]], the gradient is given by:<ref name="Schey-1992" /> | ||
<math display="block">\nabla f(\rho, \varphi, z) = \frac{\partial f}{\partial \rho}\mathbf{e}_\rho + \frac{1}{\rho}\frac{\partial f}{\partial \varphi}\mathbf{e}_\varphi + \frac{\partial f}{\partial z}\mathbf{e}_z,</math> | <math display="block">\nabla f(\rho, \varphi, z) = \frac{\partial f}{\partial \rho}\mathbf{e}_\rho + \frac{1}{\rho}\frac{\partial f}{\partial \varphi}\mathbf{e}_\varphi + \frac{\partial f}{\partial z}\mathbf{e}_z,</math> | ||
| Line 166: | Line 169: | ||
for <math>x</math> close to <math>x_0</math>, where <math>(\nabla f)_{x_0}</math> is the gradient of <math>f</math> computed at <math>x_0</math>, and the dot denotes the dot product on <math>\R^n</math>. This equation is equivalent to the first two terms in the [[Taylor series#Taylor series in several variables|multivariable Taylor series]] expansion of <math>f</math> at <math>x_0</math>. | for <math>x</math> close to <math>x_0</math>, where <math>(\nabla f)_{x_0}</math> is the gradient of <math>f</math> computed at <math>x_0</math>, and the dot denotes the dot product on <math>\R^n</math>. This equation is equivalent to the first two terms in the [[Taylor series#Taylor series in several variables|multivariable Taylor series]] expansion of <math>f</math> at <math>x_0</math>. | ||
===Relationship with {{vanchor|Fréchet derivative}}=== | ===Relationship with {{vanchor|Fréchet derivative}}=== | ||
| Line 220: | Line 224: | ||
Dividing by <math>h</math>, and taking the limit yields a term which is bounded from above by the [[Cauchy–Schwarz inequality]]<ref>{{cite book |author1=T. Arens | title=Mathematik |edition=5th |publisher=Springer Spektrum Berlin |year=2022 | doi=10.1007/978-3-662-64389-1 |isbn=978-3-662-64388-4 |url = https://doi.org/10.1007/978-3-662-64389-1}}</ref> | Dividing by <math>h</math>, and taking the limit yields a term which is bounded from above by the [[Cauchy–Schwarz inequality]]<ref>{{cite book |author1=T. Arens | title=Mathematik |edition=5th |publisher=Springer Spektrum Berlin |year=2022 | doi=10.1007/978-3-662-64389-1 |isbn=978-3-662-64388-4 |url = https://doi.org/10.1007/978-3-662-64389-1}}</ref> | ||
<math display="block">|\nabla_v f (x)| = |\nabla f \cdot v| \le |\nabla f| |v| = |\nabla f|.</math> | <math display="block">|\nabla_v f (x)| = |\nabla f \cdot v| \le ||\nabla f|| ||v|| = ||\nabla f||.</math> | ||
Choosing <math>v^* = \nabla f/|\nabla f|</math> maximizes the directional derivative, and equals the upper bound | Choosing <math>v^* = \nabla f/||\nabla f||</math> maximizes the directional derivative, and equals the upper bound | ||
<math display="block">|\nabla_{v^*} f (x)| = | | <math display="block">|\nabla_{v^*} f (x)| = ||\nabla f||^2/||\nabla f|| = ||\nabla f||.</math> | ||
==Generalizations== | ==Generalizations== | ||
| Line 308: | Line 312: | ||
* {{cite book | * {{cite book | ||
|first1 = B. A.|last1 = Dubrovin | |first1 = B. A.|last1 = Dubrovin | ||
|first2 = A. T.|last2 = Fomenko | |first2 = A. T.|last2 = Fomenko |author-link2=Anatoly Fomenko | ||
|first3 = S. P.|last3 = Novikov | |first3 = S. P.|last3 = Novikov |author-link3=Sergei Novikov (mathematician) | ||
|title = Modern Geometry—Methods and Applications: Part I: The Geometry of Surfaces, Transformation Groups, and Fields | |title = Modern Geometry—Methods and Applications: Part I: The Geometry of Surfaces, Transformation Groups, and Fields | ||
|series = [[Graduate Texts in Mathematics]] | |series = [[Graduate Texts in Mathematics]] | ||