Recently, I saw that there is no relevant information about Python crawlers in the community and library, based on the spirit of all-round development of QUANT, I learned some concepts and knowledge related to crawlers simply. After some understanding, I found that the “pit” of “crawler technology” is quite large. This article is just for the preliminary study of “crawler technology”. Do the simplest practice on the FMZ Quant Trading platform about the crawler technology.
For those who like to subscribe new shares, they always hope to get the information of currency on the exchange at the first time. It is obviously unrealistic for people to monitor the exchange website all the time. Then you need to use the crawler script to monitor the exchange announcement page and detect new announcements so that you can be notified and reminded at the first time.
A very simple program is used to get started (a really powerful crawler script is far more complex, so take your time first). The program logic is very simple. It allows the program to access the announcement page of the exchange constantly, parse the obtained HTML content, and detect whether the specific label content is updated.
You can use some useful crawler frameworks. However, considering that the requirements are very simple, you can write them directly.
The following python libraries need to be used:
```Bs4```, which can be simply understood as a library used to parse HTML code on web pages.
Code:
from bs4 import BeautifulSoup import requests
urlBinanceAnnouncement = “https://www.binancezh.io/en/support/announcement/c-48?navId=48” # Binance announcement page address
def openUrl(url): headers = {‘User-Agent’: ‘Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.108 Safari/537.36’} r = requests.get(url, headers=headers) # Use the requests library to access the url, i.e. the address of the Binance announcement page if r.status_code == 200: r.encoding = ‘utf-8’ # Log(“success! {}”.format(url)) return r.text # Return page content text if access is successful else: Log(“failed {}”.format(url))
def main(): preNews_href = “” lastNews = “” Log(“watching…”, urlBinanceAnnouncement, “#FF0000”) while True: ret = openUrl(urlBinanceAnnouncement) if ret: soup = BeautifulSoup(ret, ‘html.parser’) # Parse web text into objects lastNewshref = soup.find(‘a’, class=‘css-1ej4hfo’)[“href”] # Find a specific tag, get href lastNews = soup.find(‘a’, class_=‘css-1ej4hfo’).get_text() # Get the content in this tag if preNews_href == “”: preNews_href = lastNews_href if preNews_href != lastNews_href: # A new announcement is generated when a label change is detected Log(“New Cryptocurrency Listing update!”) # Print the prompt message preNews_href = lastNews_href LogStatus(_D(), “\n”, “preNews_href:”, preNews_href, “\n”, “news:”, lastNews) Sleep(1000 * 10) “`
It can be extended, for example, when a new announcement is detected. Analyze the new currency in the announcement, and place an order automatically to subscribe new shares.