Kezhi's Blog

Scrapy Spider Related

发表于 2020-10-24 更新于 2024-04-20 分类于 Tips Waline：本文字数： 3.6k 阅读时长 ≈ 3 分钟

This is an automatically translated post by LLM. The original post is in Chinese. If you find any translation errors, please leave a comment to help me improve the translation. Thanks!

Request Spoofing

When crawling Tencent weather data, it was found that the weather data could not be successfully crawled. After some attempts, it was discovered that the issue lies in the request headers. The default request headers used by the spider when crawling web pages are:

This default request header may be blocked when crawling many websites, making it impossible to obtain web page data. Therefore, request header spoofing is needed to make the website think it is a browser accessing it. The operation steps are as follows:

阅读全文 »

爬取B站所有番剧数据并进行数据分析

发表于 2020-10-15 更新于 2023-11-10 分类于 Project Waline：本文字数： 11k 阅读时长 ≈ 10 分钟

简介

Bilibili（以下简称B站）中有大量的番剧版权，截止目前一共有3161部。每一部番剧都可以找到它的播放量，追番量，弹幕数量等播放数据，除此之外，每部番剧还有其相应的标签（如“漫画改”，“热血”，“搞笑”）。本项目旨在分析番剧播放数据与番剧标签之间的关系，同时也是一项数据分析的大作业，采用APriori频繁项集挖掘进行分析。

GitHub地址：https://github.com/KezhiAdore/BilibiliAnimeData_Analysis

码云地址：https://gitee.com/KezhiAdore/BilibiliAnimeData_Analysis

阅读全文 »

Crawling all anime data from Bilibili and conducting data analysis.

发表于 2020-10-15 更新于 2024-04-20 分类于 Project Waline：本文字数： 16k 阅读时长 ≈ 15 分钟

This is an automatically translated post by LLM. The original post is in Chinese. If you find any translation errors, please leave a comment to help me improve the translation. Thanks!

Introduction

Bilibili (referred to as B station) has a large number of anime copyrights, with a total of 3161 as of now. Each anime can be found with its play count, follow count, barrage count, and other playback data. In addition, each anime has its corresponding tags (such as "comic adaptation", "hot-blooded", "comedy"). This project aims to analyze the relationship between anime playback data and anime tags, and it is also a data analysis project that uses APriori frequent itemset mining for analysis.

GitHub address: https://github.com/KezhiAdore/BilibiliAnimeData_Analysis

Gitee address: https://gitee.com/KezhiAdore/BilibiliAnimeData_Analysis

阅读全文 »

黄蜂群算法改进的多机器人区域覆盖

发表于 2020-06-23 更新于 2023-11-10 分类于 Research Waline：本文字数： 4.5k 阅读时长 ≈ 4 分钟

GitHub地址：https://github.com/KezhiAdore/MultiRobots_CoverMap

码云地址：https://gitee.com/KezhiAdore/MultiRobots_CoverMap

一、问题提出

1. 背景

机器人在工业、国防和科学技术中的应用日益广泛，它能够在枯燥和危险的复杂非结构化环境下工作。使用机器人大大提高了人们的工作效率，改变了人们的生活方式，带来了巨大的经济和社会效益，有力地推动了有关学科和技术领域的发展。

区域覆盖是指携带有一定探测范围的传感器如激光、声纳等的机器人探索访问整个区域，并完成相应任务的过程。区域覆盖是自主移动机器人的一项基本任务。

阅读全文 »

Improved multi-robot area coverage using the Bee Algorithm.

发表于 2020-06-23 更新于 2024-04-20 分类于 Research Waline：本文字数： 12k 阅读时长 ≈ 11 分钟

This is an automatically translated post by LLM. The original post is in Chinese. If you find any translation errors, please leave a comment to help me improve the translation. Thanks!

GitHub link: https://github.com/KezhiAdore/MultiRobots_CoverMap

Gitee link: https://gitee.com/KezhiAdore/MultiRobots_CoverMap

1. Introduction

1. Background

Robots are increasingly being used in industries, defense, and scientific and technological fields. They can work in complex and dangerous environments that are monotonous and unstructured. The use of robots greatly improves work efficiency, changes people's lifestyles, brings enormous economic and social benefits, and promotes the development of related disciplines and technologies.

Area coverage refers to the process of exploring and accessing the entire area with robots carrying sensors with a certain detection range, such as lasers and sonars, and completing corresponding tasks. Area coverage is a fundamental task of autonomous mobile robots.

阅读全文 »

数字图像处理（六）频域滤波器

发表于 2020-05-08 更新于 2023-11-10 分类于 Study Waline：本文字数： 3.6k 阅读时长 ≈ 3 分钟

频域滤波器简介

频域滤波器与空域滤波器相对，空域滤波器是空间卷积块与空间图像做卷积运算，而频域滤波是在将图像进行了离散傅里叶变换之后在频域上与滤波器做乘法，之后再进行傅里叶反变换得到滤波之后的图像。

使用频域滤波器对图像进行滤波的基本步骤：

对图像进行DFT变换得到原图像的频域图像
频域图像与频域滤波器相乘得到新的频域图像
对新的频域图像进行IDFT变换到空域得到新图像。

阅读全文 »

Digital Image Processing (6) Frequency Domain Filters.

发表于 2020-05-08 更新于 2024-04-20 分类于 Study Waline：本文字数： 7.2k 阅读时长 ≈ 7 分钟

This is an automatically translated post by LLM. The original post is in Chinese. If you find any translation errors, please leave a comment to help me improve the translation. Thanks!

Introduction to Frequency Domain Filters

Frequency domain filters are different from spatial domain filters. Spatial domain filters perform convolution operations between spatial convolution kernels and spatial images, while frequency domain filters perform multiplication operations between the frequency domain image obtained by discrete Fourier transform (DFT) and the filter in the frequency domain, and then perform inverse Fourier transform (IDFT) to obtain the filtered image.

The basic steps of using frequency domain filters to filter images are as follows:

Perform DFT on the image to obtain the frequency domain image of the original image.
Multiply the frequency domain image by the frequency domain filter to obtain a new frequency domain image.
Perform IDFT on the new frequency domain image to obtain a new image in the spatial domain.

阅读全文 »

BP神经网络

发表于 2020-05-06 更新于 2023-11-10 分类于 Study Waline：本文字数： 2.2k 阅读时长 ≈ 2 分钟

BP神经网络概述

BP神经网络是神经网络中最基础也是应用最多的神经网络，它由三层节点组成，分别为输入层，隐含层和输出层。BP神经网络的实现比较简单，主要分为正向传递输出和反向传递误差两部分。

神经网络的万能逼近定律：一维阶梯函数的线性组合可以逼近任何一维连续函数；sigmoid函数可以逼近阶梯函数，因此一维sigmoid函数的线性组合能够逼近任何连续函数。（1989年，Robert Hecht-Nielsen）这为神经网络的应用提供了理论依据。

阅读全文 »

Bp Neural Network.

发表于 2020-05-06 更新于 2024-04-20 分类于 Study Waline：本文字数： 5.4k 阅读时长 ≈ 5 分钟

This is an automatically translated post by LLM. The original post is in Chinese. If you find any translation errors, please leave a comment to help me improve the translation. Thanks!

Overview of BP Neural Network

The BP neural network is the most basic and widely used neural network in the field. It consists of three layers of nodes: the input layer, the hidden layer, and the output layer. The implementation of the BP neural network is relatively simple, mainly divided into two parts: forward propagation and backward propagation of errors.

The universal approximation theorem of neural networks states that a one-dimensional step function can approximate any one-dimensional continuous function, and a sigmoid function can approximate a step function. Therefore, a linear combination of one-dimensional sigmoid functions can approximate any continuous function. This provides a theoretical basis for the application of neural networks.

The advantage of neural networks lies in the fact that many complex function mappings that are difficult to solve can be obtained by combining multiple one-dimensional step functions. The main problem and difficulty in building a neural network is how to combine these one-dimensional functions.

Understanding BP Neural Network

The BP neural network can be seen as a multi-input multi-output function. If we ignore its internal structure, it can be represented as a black box model:

In this BP neural network, there are $m$ inputs and $n$ outputs. We know that there should be a hidden layer between the input and output layers. So how many nodes should be in the hidden layer? Generally, the determination of the hidden layer is determined by the following empirical formula: $\begin{matrix} (1) & h = \sqrt{m + n} + a \end{matrix}$ where $h$ is the number of nodes in the hidden layer, $m$ is the number of nodes in the input layer, $n$ is the number of nodes in the output layer, and $a$ is an adjustment constant.

Based on the number of input and output nodes, we can construct a simple BP neural network model. Its internal structure is as follows (taking $m = 3$ , $n = 3$ , and $h = 3$ as an example):

Internal structure of a three-layer neural network

With such a three-layer neural network, any 3D-to-3D mapping can be achieved through the combination of one-dimensional functions. So how to establish this mapping? This problem is actually how to train the BP neural network. The training process mainly consists of two parts: forward propagation of results and backward propagation of residuals.

Forward Propagation

For any node in the BP neural network, its input is the weighted sum of the outputs of the previous layer nodes. Taking the hidden layer as an example, let the output of the input layer node be $x_{i}$ , the input of the hidden layer node be $n e t_{j}$ , the weight connecting node $i$ in the input layer to node $j$ in the hidden layer be $w_{i j}$ , and the constant term be $b_{j}$ . Then the input of the hidden layer node is: $\begin{matrix} (2) & n e t_{j} = \sum_{i = 1}^{m} w_{i j} x_{i} + b_{j} \end{matrix}$ In the BP neural network, in order to ensure that the activation function is differentiable everywhere, the sigmoid function is used as the activation function. The output of the node is: $\begin{matrix} (3) & f (n e t_{j}) = \frac{1}{1 + e^{- n e t_{j}}} \end{matrix}$

Advantages of using sigmoid function:

Compared to the step function, it is differentiable everywhere in its domain.

Let $y = s i g m o i d (x)$ , then $y^{'} = y (1 - y)$ . It can be seen that the derivative of the sigmoid function can be represented using itself. Once the value of the sigmoid function is calculated, it is very convenient to calculate the value of its derivative. This provides convenience for using gradient descent in backpropagation.

Main disadvantages of the sigmoid function:

Vanishing gradient: Note that when the sigmoid function approaches 0 or 1, the rate of change becomes flat, which means that the gradient of the sigmoid tends to 0. Neurons in the network that use the sigmoid activation function and have outputs close to 0 or 1 are called saturated neurons. Therefore, the weights of these neurons will not be updated. In addition, the weights connected to these neurons will also be updated slowly. This problem is called the vanishing gradient problem. Therefore, imagine that if a large neural network contains sigmoid neurons, and many of them are in a saturated state, the network cannot perform backpropagation.

Not zero-centered: The output of the sigmoid is not zero-centered.

High computational cost: The exp() function has a higher computational cost compared to other nonlinear activation functions.

Each neuron performs this independent calculation, so for a set of inputs, the neural network can perform calculations to obtain the corresponding outputs. This is the process of forward propagation.

Backward Propagation

At the beginning, all the weights in the system are randomly determined. Therefore, in order to make the model tend to the desired result through learning training data, the weights in the nodes need to be continuously adjusted. The basic algorithm idea of backward propagation is the gradient descent algorithm in nonlinear programming, and the goal of the programming is to minimize the loss function. The general process is as follows:

Set the loss function. Assuming that all the results of the output layer are $d_{j}$ , the loss function is as follows:

$\begin{matrix} (4) & E (w, b) = \frac{1}{2} \sum_{j = 0}^{n - 1} (d_{j} - y_{j})^{2} \end{matrix}$

Modify the $w$ and $b$ from the hidden layer to the output layer through the loss function. For the weight $w_{i j}$ from the hidden layer node $i$ to the output layer node $j$ , the modification is as follows (where $η$ is the learning rate):

$\begin{matrix} (5) & Δ w = - η \frac{\partial E}{\partial w_{i j}} \end{matrix}$

Similarly, the modification for $b$ is:

$\begin{matrix} (6) & Δ b = - η \frac{\partial E}{\partial b_{i}} \end{matrix}$

This is basically the idea. The process of calculating partial derivatives is quite complex, so I won't go into detail here. Just remember the idea of using gradient descent to minimize the loss function.