The least common multiple of the linear regression theorem

Author: Inventors quantify - small dreams, Created: 2016-12-18 11:36:26, Updated: 2016-12-18 11:41:31

The least common multiple of the linear regression theorem

### One, the introduction

During this time, I learned the logistic regression of the logistic regression of the logistic regression of the logistic regression of the logistic regression of the logistic regression of the logical regression of the logical regression of the logical regression of the logical regression of the logical regression of the logical regression of the logical regression of the logical regression of the logical regression of the logical regression of the logical regression of the logical regression of the logical regression of the logical regression of the logical regression of the logical regression of the logical regression of the logical regression of the logical regression of the logical regression of the logical regression of the logical regression of the logical regression of the logical regression of the logical regression of the logical regression of the logical regression of the logical regression of the logical regression of the logical regression of the logical regression of the logical regression of the logical regression of the logical regression of the logical regression of the logical The logarithmic least squares logarithm is an implementation of the empirical formula in the optimization problem. Understanding its operation is useful for understanding the logarithmic regression logarithm and the logarithmic support vector machine learning logarithms.

### Second, background knowledge

The historical background to the appearance of the smallest two-fold square is interesting.

In 1801, the Italian astronomer Giuseppe Piazzi discovered the first asteroid, the constellation Orion. After 40 days of tracking, Piazzi lost its position due to the constellation running behind the Sun. Subsequently, scientists around the world began searching for Orion using Piazzi's observations, but no results were found based on most people's calculations.

Gauss's method for the least squares was published in 1809 in his Theory of Motion of the Cosmos, and the French scientist Le Jeandard independently discovered the least squares in 1806, but this was not known at the time. There was a dispute over who first established the principle of the least squares.

In 1829, Gauss provided a proof that the optimization effect of the minimal binomial is stronger than other methods, see Gauss-Markov theorem.

### Three, applying knowledge

The core of the quadratic least squares formula is to guarantee the square and least of all the data deviations.

Let's say we collect some longitude and latitude data for some warships.

线性回归之——最小二乘法

Based on this data, we used Python to draw a scatter plot:

线性回归之——最小二乘法

The code for drawing the dotted line is as follows:

```
import numpy as np                # -*- coding: utf-8 -*
import os
import matplotlib.pyplot as plt
def drawScatterDiagram(fileName): # 改变工作路径到数据文件存放的地方
    os.chdir("d:/workspace_ml")
    xcord=[];ycord=[]
    fr=open(fileName)
    for line in fr.readlines():
        lineArr=line.strip().split()
        xcord.append(float(lineArr[1]));ycord.append(float(lineArr[2]))
    plt.scatter(xcord,ycord,s=30,c='red',marker='s')
    plt.show()
```

假如我们取前两个点（238,32.4）（152, 15.5）就可以得到两个方程
152*a+b=15.5
328*a+b=32.4
解这两个方程得a=0.197,b=-14.48
那样的话，我们可以得到这样的拟合图：

![线性回归之——最小二乘法](/upload/asset/8c4ec1df86e5867e4ce4da6af7d7c8423b163ef7.png) 

好了，新的问题来了，这样的a,b是不是最优解呢？用专业的说法就是：a,b是不是模型的最优化参数？在回答这个问题之前，我们先解决另外一个问题：a,b满足什么条件才是最好的？

答案是：保证所有数据偏差的平方和最小。至于原理，我们会在后面讲，先来看看怎么利用这个工具来计算最好的a和b。假设所有数据的平方和为M，则

![线性回归之——最小二乘法](/upload/asset/7189e60a47e6a0a78747c40ff511abe8357350c0.png) 

我们现在要做的就是求使得M最小的a和b。请注意这个方程中，我们已知yi和xi

那其实这个方程就是一个以（a,b）为自变量，M为因变量的二元函数。

回想一下高数中怎么对一元函数就极值。我们用的是导数这个工具。那么在二元函数中，  我们依然用导数。只不过这里的导数有了新的名字“偏导数”。偏导数就是把两个变量中的一个视为常数来求导。
通过对M来求偏导数，我们得到一个方程组

![线性回归之——最小二乘法](/upload/asset/4a863a2f6678f5689641aafb11860b12bc820f80.png) 

这两个方程中xi和yi都是知道的。

很容易就求得a和b了。由于采用的是维基百科的数据，我这里就直接用答案来画出拟合图像：

![线性回归之——最小二乘法](/upload/asset/2cfbd2f5af3b691577361855ebe337110be5991d.png) 

```
# -*- coding: utf-8 -*importnumpy as npimportosimportmatplotlib.pyplot as pltdefdrawScatterDiagram(fileName):
# 改变工作路径到数据文件存放的地方os.chdir("d:/workspace_ml")xcord=[];
# ycord=[]fr=open(fileName)forline infr.readlines():lineArr=line.strip().split()xcord.append(float(lineArr[1]));
# ycord.append(float(lineArr[2]))plt.scatter(xcord,ycord,s=30,c='red',marker='s')
# a=0.1965;b=-14.486a=0.1612;b=-8.6394x=np.arange(90.0,250.0,0.1)y=a*x+bplt.plot(x,y)plt.show()
# -*- coding: utf-8 -*
import numpy as np
import os
import matplotlib.pyplot as plt
def drawScatterDiagram(fileName):
    #改变工作路径到数据文件存放的地方
    os.chdir("d:/workspace_ml")
    xcord=[];ycord=[]
    fr=open(fileName)
    for line in fr.readlines():
        lineArr=line.strip().split()
        xcord.append(float(lineArr[1]));ycord.append(float(lineArr[2]))
    plt.scatter(xcord,ycord,s=30,c='red',marker='s')
    #a=0.1965;b=-14.486
    a=0.1612;b=-8.6394
    x=np.arange(90.0,250.0,0.1)
    y=a*x+b
    plt.plot(x,y)
    plt.show()
```

### Four, the principle of exploration

In data matching, why optimize the model parameters by having the predicted data squared with the difference between the actual data rather than absolute and minimum values?

This question has already been answered, see the link.http://blog.sciencenet.cn/blog-430956-621997.html）

I personally find this explanation very interesting. Especially the assumption that all points deviating from f (x) are noisy.

The greater the deviation of a point, the smaller the probability that the point will occur. What is the relationship between the degree of deviation x and the probability of occurrence f (x)?

线性回归之——最小二乘法

### ###################################################################################################################################################################################################

All of the above are two-dimensional situations, i.e. there is only one independent variable. But in the real world, the final result is influenced by the superposition of multiple factors, i.e. there are multiple cases of independent variables.

For a general N-sublinear function, it is OK to query with the inverse matrix in the tangent linear algebra; as no suitable example has been found for the time being, it is left here as a derivative.

Of course, nature is more about polynomial conformation than simple linearity, which is a higher content.

### References
- The Higher Mathematical Dictionary (sixth edition) (published by Higher Education)
- He is the founder of the Beijing University Press.
- This is the first time I've seen this video.The least common multiple of two.
- Wikipedia: Minimum of two multiples
- ScienceNet: What is it?The smallest power of two?

Original work, reproduction is allowed, when reproducing, be sure to indicate the original source of the article in the form of hyperlinks, author information and this statement; otherwise, legal liability will be pursued.http://sbp810050504.blog.51cto.com/2799422/1269572

More

Forums
|
API
|
Github
|
About
|
SiteMap