Fun machine learning: the most concise introductory guide

Author: Inventors quantify - small dreams, Created: 2016-12-07 12:37:21, Updated: 2016-12-07 12:40:26

Fun machine learning: the most concise introductory guide

When you hear people talk about machine learning, do you have a vague idea of what it means? Are you tired of nodding your head when talking to your colleagues? Let's change it!

This guide is intended for anyone who is curious about machine learning but doesn't know how to get started. I'm sure many of you have read Wikipedia's Wikipedia entry about machine learning and are frustrated that no one can give a high-level explanation. This article is what you want.

The goal of this article is to be accessible, which means there are a lot of generalizations. But who cares?

### Why machine learning?

The concept of machine learning is that you don't need to write any special program code for the outstanding problem, and generic algorithms can generate interesting answers for you from the data set. For a genetic algorithm, instead of coding, it enters data and builds its own logic on top of the data.

For example, there is an algorithm called a sort algorithm that can sort data into different groups. A sort algorithm that is used to recognize handwritten numbers can be used to sort emails into spam and plain mail without modifying a line of code. The algorithm is unchanged, but the input training data has changed, so it yields a different sorting logic.

有趣的机器学习：最简明入门指南

Machine learning algorithms are a black box that can be reused to solve many different classification problems.

Machine learning algorithms are an umbrella term that covers a large number of similar genetic algorithms.

Two types of machine learning algorithms

You can think of machine learning algorithms as being divided into two broad categories: supervised learning and unsupervised learning.
- Supervised learning Let's say you are a real estate agent and your business grows bigger, so you hire a bunch of interns to help you. But the problem is that you can see at a glance how much the house is worth, and the interns have no experience and don't know how to value it.
To help your interns (perhaps to free yourself up for a vacation), you decide to write a little piece of software that can assess the value of a home in your area based on factors such as the size of the home, the site, and the price of a similar home.

You write down every house transaction in the city for the last three months, each one you record in a long string of details - number of bedrooms, size of the house, plot, etc. But most importantly you write down the final transaction price:

This is our yoga training data set.

We're going to use this training data to write a program to estimate the value of other houses in the area:

This is called supervised learning. You already know the price of each house, in other words, you know the answer to the question and can reverse the logic of the solution.

In order to write software, you will input training data for each property set into your machine learning algorithm. The algorithm tries to figure out which operations should be used to derive the price numbers.

It's like an arithmetic exercise where the symbols for operations in the equation are wiped out:

My goodness! one sneaky student wiped out the whole arithmetic symbol on the teacher's answer.

After reading these questions, can you understand what kind of mathematical problems are in these tests? You know, you should do something about the left-hand side of the arithmetic to get the right-hand side.

In supervised learning, you let the computer calculate the relationships between the numbers for you. Once you know the mathematical methods needed to solve this particular type of problem, you can solve other problems of the same type.
- #### Non-supervised learning
Let's go back to the real estate agent example from the beginning. If you don't know what the price of each house is going to be, even if you only know the size of the house, the location, etc., you can still make a cool idea. This is called unsupervised learning.

Even if you're not trying to predict unknown data (such as prices), you can use machine learning to do something interesting.

It's kind of like someone gives you a piece of paper with a bunch of numbers on it and says, "Hey, I don't know what these numbers mean, maybe you can figure out a pattern or classify them, or whatever -- good luck".

What do you do with this data? First, you can use an algorithm to automatically sort out different segments of the market from the data. Perhaps you will find that homebuyers near the university prefer smaller houses with more bedrooms, while buyers in the suburbs prefer larger houses with three bedrooms. This information can directly help your marketing.

You can also do something cool and automatically find out the difference between the price of a house and other data. The properties in these upscale housing communities may be high-rise buildings, and you can focus the best salespeople in these areas because their commissions are higher.

In the rest of this article, we will focus on supervised learning, but this is not because unsupervised learning is useless or completely useless. In fact, as algorithms improve, there is no need to link data to correct answers, so unsupervised learning is becoming increasingly important.

There are many other types of machine learning algorithms. But this is understandable for beginners.

It's cool, but can appraising house prices really be considered a learning curve?

As a human being, your brain can handle the vast majority of situations and can learn how to handle them without any clear instructions. If you have been a real estate agent for a long time, you will have an instinctive sense of what the proper pricing of the property is, how best to market it and which customers will be interested in it, etc. The goal of strong AI research is to be able to replicate this ability with computers.

However, current machine learning algorithms are not as good as they used to be - they can only focus on very specific, limited problems. Perhaps in this case, a more appropriate definition of machine learning algorithms is to find an equation to solve a specific problem based on a small amount of sample data.

Unfortunately, the name is too bad for the algorithm to find an equation to solve a specific problem based on a small sample of data. So we ended up replacing it with algorithm learning algorithms.

Of course, if you're reading this article 50 years from now, we've come up with a powerful AI algorithm, and this looks like an old antique.

Let's write the code!

In the previous example, how do you plan to write the procedure for assessing the price of a house?

If you don't know anything about machine learning, chances are you'll try to write down some basic rules for assessing house prices, like:
```
def estimate_house_sales_price(num_of_bedrooms, sqft, neighborhood):
    price = 0
    # In my area, the average house costs $200 per sqft
    price_per_sqft = 200


    if neighborhood == "hipsterton":
    # but some areas cost a bit more
        price_per_sqft = 400


    elif neighborhood == "skid row":
    # and some areas cost less
        price_per_sqft = 100


    # start with a base price estimate based on how big the place is
    price = price_per_sqft * sqft


    # now adjust our estimate based on the number of bedrooms
    if num_of_bedrooms == 0:
    # Studio apartments are cheap
        price = price — 20000
    else:
    # places with more bedrooms are usually
    # more valuable
        price = price + (num_of_bedrooms * 1000)


    return price
```
If you're busy like this for a few hours, you might get some results, but your process will never be perfect and it's hard to maintain when prices change.

Wouldn't it be better if the computer could figure out a way to implement the functions above? As long as the price figures returned are correct, who cares what the functions actually do?
```
def estimate_house_sales_price(num_of_bedrooms, sqft, neighborhood):
    price = <computer, plz do some math for me>
    return price
```
One way to think about this is to think of the price of a house as a bowl of delicious soup, with the ingredients in the soup being the number of bedrooms, the area and the plot. If you can calculate how much each ingredient affects the final price, perhaps you can get a specific proportion of the ingredients mixed together to form the final price.

This can simplify your initial program (all crazy if else statements) to something like this:
```
def estimate_house_sales_price(num_of_bedrooms, sqft, neighborhood):
    price = 0


    # a little pinch of this
    price += num_of_bedrooms * .841231951398213


    # and a big pinch of that
    price += sqft * 1231.1231231


    # maybe a handful of this
    price += neighborhood * 2.3242341421


    # and finally, just a little extra salt for good measure
    price += 201.23432095


    return price
```
Note the magical numbers in bold..841231951398213, 1231.1231231, 2.3242341421, and 201.23432095. They are called weights. If we could find the perfect weights for each house, our function could predict all house prices!

One way to find the optimal weight is as follows:

Step one:

First, set each weight to 1.0:
```
def estimate_house_sales_price(num_of_bedrooms, sqft, neighborhood):
    price = 0


    # a little pinch of this
    price += num_of_bedrooms * 1.0


    # and a big pinch of that
    price += sqft * 1.0


    # maybe a handful of this
    price += neighborhood * 1.0


    # and finally, just a little extra salt for good measure
    price += 1.0


    return price
```
Step two:

Take each property into your functional calculation and check how much the estimated value deviates from the correct price:

Use your program to predict house prices.

For example, if the first property in the table above is actually sold for $250,000, your functional estimate is $178,000, which is $72,000 less than your property.

Then take the square of the value of each property in your dataset. Suppose the dataset has 500 property transactions, and the value of the value of the value of each property in your dataset is $86,123,373. This reflects the correct slope of your function.

Now, divide the total value by 500 to get the average of the estimated deviations of each property. Call this average error value the cost of your function.

If you can adjust the weights to make this cost zero, your function is perfect. It means that your program will estimate each property transaction equally based on the input data.

Step three:

Repeat step 2 over and over again. Try all possible weighted combinations. Which combination brings the cost closest to zero is what you're going to use, and as soon as you find that combination, the problem is solved!

Thought disrupts time

It's so simple, right? Think about what you just did. You get some data, you input it into three common simple steps, and you end up with a function that can estimate houses in your area. But the following facts may disturb your mind:
- 1. Over the past 40 years, research in many fields (such as linguistics/translation) has shown that this general-purpose vibratory data-mining (the word I coined) learning algorithm has outperformed methods that require the use of real people with clear rules.
- 2.你最后写出的函数真是笨，它甚至不知道什么是“面积”和“卧室数”。它知道的只是搅动，改变数字来得到正确的答案。
- 3.很可能你都不知道为何一组特殊的权重值能起效。所以你只是写出了一个你实际上并不理解却能证明的函数。
- 4.试想一下，你的程序里没有类似“面积”和“卧室数”这样的参数，而是接受了一组数字。假设每个数字代表了你车顶安装的摄像头捕捉的画面中的一个像素，再将预测的输出不称为“价格”而是叫做“方向盘转动度数”，这样你就得到了一个程序可以自动操纵你的汽车了！
It's crazy, right?

What happens when the tick in step 3 tries each number tick?

Well, of course you can't try all the possible weights to find the best combination. That would take a long time, because the numbers to try could be endless. To avoid this, mathematicians have found many clever ways to quickly find good weight values without trying too hard. Here is one of them: First, write a simple equation that represents step 2:

This is your cost function.

Next, let's rewrite this same equation in machine-learning mathematical terms (now you can ignore them):

θ denotes the current weighting value. J ((θ) denotes the cost of the corresponding weighting value of θ.

This equation represents the size of the deviation of our estimator under the current weighted value.

If we graph all the possible weight values assigned to the number and area of bedrooms, we get a graph that looks something like this:

The graph of the cost function looks like a bowl. The vertical axis represents the cost.

The lowest point in blue is the lowest cost point in the graph, which means that our program deviates the least. The highest point means the greatest deviation. So if we can find the lowest point in the graph that leads us to a set of weight values, we have the answer!

Therefore, we only need to adjust the weights so that we can slide downhill towards the lowest point on the graph. If the small adjustment to the weights keeps us moving towards the lowest point, then ultimately we can get there without trying too hard.

If you remember a little bit of arithmetic, you might remember that if you query a function, the result will tell you the slope of the function at any point. In other words, given a point on the graph, it tells us that the path is sloping. We can use that to go downwards.

So, if we bias the cost function about each weight, then we can subtract that value from each weight. This brings us closer to the bottom of the mountain. Keep doing this, and eventually we'll get to the bottom and get the maximum value of the weight.

This method of finding the optimal weight is called mass gradient descent, and it is summarized in height.http://hbfs.wordpress.com/2012/04/24/introduction-to-gradient-descent/）吧。

When you use a machine learning algorithm library to solve a real problem, all of this is ready for you. But understanding some specific details is always useful.

What else did you miss?

The three-step algorithm I described above is called multi-linear regression. You estimate the equation by finding a straight line that can fit all the price data points. Then you use this equation to estimate the price of a house that has never been seen before based on where the price might appear on your line.

However, the method I'm going to show you may work in simple situations, it won't work in all situations. One of the reasons is that house prices won't always simply follow a continuous line.

Fortunately, however, there are many ways to handle this. For nonlinear data, many other types of machine learning algorithms can be handled (such as neural networks or nuclear vector machines). There are also many ways to use linear regression more flexibly, thinking about fitting it with more complex lines. In all cases, the basic idea of finding the most preferred weight still applies.

Also, I've overlooked the concept of overmatching. It's easy to come across a set of weighted values that can perfectly predict the prices of homes in your original dataset, but not any new homes outside of your original dataset. There are many solutions to this situation (such as normalization and using cross-validated datasets). Learning how to handle this problem is critical to the smooth application of machine learning.

In other words, the basic concept is very simple and requires some skills and experience to use machine learning to get useful results. However, this is a skill that every developer can learn.
- #### Is machine learning boundless?
Once you start to understand how easy it is to apply machine learning techniques to solve seemingly difficult problems (like handwriting recognition), you get the feeling that with enough data, you can use machine learning to solve any problem. Just input the data and you see that the equations that fit the data are found just like a computer manipulates the data.

But it's important to remember that machine learning only applies to problems that can actually be solved with the data you have.

For example, if you build a model to predict house prices based on the number of potatoes in each house, it will never succeed. There is no relationship between the number of potatoes in the house and the house price. So, no matter how hard it tries, the computer cannot infer the relationship between the two.

You can only model relationships that actually exist.
- #### How to learn deep machine learning
I think the biggest problem with machine learning today is that it is mainly active in academic and business research organizations. There is not a lot of simple and easy learning material for people outside the circle who want to have a general understanding rather than want to become experts. But this situation is improving every day.

Professor Andrew Ng's free machine learning course on Coursera is very good. I highly recommend it. Anyone with a degree in computer science and a little bit of math should understand it.

Additionally, you can download and install SciKit-Learn, which you can use to test thousands of machine learning algorithms. It is a python framework with a black box version for all standard algorithms.