{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Numpy - 多维数据数组软件库" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "NumPy是Python中科学计算的基本软件包。它是一个Python库,提供多维数组对象、各种派生类(如掩码数组和矩阵)和各种例程。\n", "* 用于对数组进行快速操作,包括数学、逻辑、形状操作、排序、选择、I/O、离散傅立叶变换、基本线性代数、基本统计操作、随机模拟等等。\n", "* Numpy作为Python数据计算的基础广泛应用到数据处理、信号处理、机器学习等领域。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "![cover image](images/numpy.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1. 简介" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`numpy`包(模块)用在几乎所有使用Python的数值计算中,为Python提供高性能向量,矩阵和高维数据结构的模块。它是用C和Fortran语言实现的,因此当计算向量化数据(用向量和矩阵表示)时,性能非常的好。\n", "\n", "为了使用`numpy`模块,你先要像下面的例子一样导入这个模块:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "# 这一行的作用会在Matplotlib中介绍\n", "%matplotlib inline\n", "import matplotlib.pyplot as plt" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "# 不建议用这种方式导入库\n", "from numpy import *" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "# 建议使用这种方式\n", "import numpy as np" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**建议大家使用第二种导入方法** `import numpy as np`\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2. 创建`numpy`数组" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "有很多种方法去初始化新的numpy数组, 例如从\n", "\n", "* Python列表或元组\n", "* 使用专门用来创建numpy arrays的函数,例如 `arange`, `linspace`等\n", "* 从文件中读取数据" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2.1 从列表中" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "例如,为了从Python列表创建新的向量和矩阵我们可以用`numpy.array`函数。\n" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1, 2, 3, 4]\n" ] }, { "data": { "text/plain": [ "array([1, 2, 3, 4])" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import numpy as np\n", "\n", "a = [1, 2, 3, 4]\n", "print(a)\n", "\n", "# a vector: the argument to the array function is a Python list\n", "v = np.array(a)\n", "\n", "v" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[1 2]\n", " [3 4]\n", " [5 6]]\n", "\n", "(3, 2)\n" ] } ], "source": [ "# 矩阵:数组函数的参数是一个嵌套的Python列表\n", "M = np.array([[1, 2], [3, 4], [5, 6]])\n", "\n", "print(M)\n", "print()\n", "print(M.shape)" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[[ 1 2]\n", " [ 3 4]\n", " [ 5 6]]\n", "\n", " [[ 3 4]\n", " [ 5 6]\n", " [ 7 8]]\n", "\n", " [[ 5 6]\n", " [ 7 8]\n", " [ 9 10]]\n", "\n", " [[ 7 8]\n", " [ 9 10]\n", " [11 12]]]\n", "\n", "(4, 3, 2)\n" ] } ], "source": [ "M = np.array([[[1,2], [3,4], [5,6]], \\\n", " [[3,4], [5,6], [7,8]], \\\n", " [[5,6], [7,8], [9,10]], \\\n", " [[7,8], [9,10], [11,12]]])\n", "print(M)\n", "print()\n", "print(M.shape)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`v`和`M`两个都是属于`numpy`模块提供的`ndarray`类型。" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(numpy.ndarray, numpy.ndarray)" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type(v), type(M)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`v`和`M`之间的区别仅在于他们的形状。我们可以用属性函数`ndarray.shape`得到数组形状的信息。" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(4,)" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "v.shape" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(4, 3, 2)" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "M.shape" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "通过属性函数`ndarray.size`我们可以得到数组中元素的个数" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "24" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "M.size" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "同样,我们可以用函数`numpy.shape`和`numpy.size`" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(4, 3, 2)" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.shape(M)" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "24" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.size(M)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "到目前为止`numpy.ndarray`看起来非常像Python列表(或嵌套列表)。为什么不简单地使用Python列表来进行计算,而不是创建一个新的数组类型?\n", "\n", "下面有几个原因:\n", "\n", "* Python列表非常普遍。它们可以包含任何类型的对象。它们是动态类型的。它们不支持矩阵和点乘等数学函数。由于动态类型的关系,为Python列表实现这类函数的效率不是很高。\n", "* Numpy数组是**静态类型的**和**同构的**。元素的类型是在创建数组时确定的。\n", "* Numpy数组是内存高效的。\n", "* 由于是静态类型,数学函数的快速实现,比如“numpy”数组的乘法和加法可以用编译语言实现(使用C和Fortran).\n", "\n", "利用`ndarray`的属性函数`dtype`(数据类型),我们可以看出数组的数据是那种类型。\n" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dtype('int64')" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "M.dtype" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "如果我们试图给一个numpy数组中的元素赋一个错误类型的值,我们会得到一个错误:" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "ename": "ValueError", "evalue": "invalid literal for int() with base 10: 'hello'", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mValueError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mM\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m0\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;36m0\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;36m0\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m\"hello\"\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;31mValueError\u001b[0m: invalid literal for int() with base 10: 'hello'" ] } ], "source": [ "M[0,0,0] = \"hello\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "如果我们想的话,我们可以利用`dtype`关键字参数显式地定义我们创建的数组数据类型:" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[1.+0.j, 2.+0.j],\n", " [3.+0.j, 4.+0.j]])" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "M = np.array([[1, 2], [3, 4]], dtype=complex)\n", "\n", "M" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "常规可以伴随`dtype`使用的数据类型是:`int`, `float`, `complex`, `bool`, `object`等\n", "\n", "我们也可以显式地定义数据类型的大小,例如:`int64`, `int16`, `float128`, `complex128`。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2.2 使用数组生成函数" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "对于较大的数组,使用显式的Python列表人为地初始化数据是不切实际的。除此之外我们可以用`numpy`的很多函数得到不同类型的数组。有一些常用的分别是:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### arange" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[0 1 2 3 4 5 6 7 8 9]\n", "[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]\n" ] } ], "source": [ "# 创建一个范围\n", "\n", "x = np.arange(0, 10, 1) # 参数:start, stop, step: \n", "y = range(0, 10, 1)\n", "print(x)\n", "print(list(y))" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([-1.00000000e+00, -9.00000000e-01, -8.00000000e-01, -7.00000000e-01,\n", " -6.00000000e-01, -5.00000000e-01, -4.00000000e-01, -3.00000000e-01,\n", " -2.00000000e-01, -1.00000000e-01, -2.22044605e-16, 1.00000000e-01,\n", " 2.00000000e-01, 3.00000000e-01, 4.00000000e-01, 5.00000000e-01,\n", " 6.00000000e-01, 7.00000000e-01, 8.00000000e-01, 9.00000000e-01])" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x = np.arange(-1, 1, 0.1)\n", "\n", "x" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### linspace and logspace" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 0. , 2.5, 5. , 7.5, 10. ])" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# 使用linspace两边的端点也被包含进去\n", "np.linspace(0, 10, 5)" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([1.00000000e+00, 3.03773178e+00, 9.22781435e+00, 2.80316249e+01,\n", " 8.51525577e+01, 2.58670631e+02, 7.85771994e+02, 2.38696456e+03,\n", " 7.25095809e+03, 2.20264658e+04])" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.logspace(0, 10, 10, base=np.e)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### mgrid" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [], "source": [ "y, x = np.mgrid[0:5, 0:5] # 和MATLAB中的meshgrid类似" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[0, 1, 2, 3, 4],\n", " [0, 1, 2, 3, 4],\n", " [0, 1, 2, 3, 4],\n", " [0, 1, 2, 3, 4],\n", " [0, 1, 2, 3, 4]])" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[0, 0, 0, 0, 0],\n", " [1, 1, 1, 1, 1],\n", " [2, 2, 2, 2, 2],\n", " [3, 3, 3, 3, 3],\n", " [4, 4, 4, 4, 4]])" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "y" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### random data" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [], "source": [ "from numpy import random" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[[0.57397454, 0.12434228],\n", " [0.74835474, 0.01034541],\n", " [0.91383579, 0.02807574],\n", " [0.14217509, 0.64698341]],\n", "\n", " [[0.65606545, 0.84787378],\n", " [0.31064031, 0.70205451],\n", " [0.30486756, 0.34702889],\n", " [0.47537986, 0.91154076]],\n", "\n", " [[0.32192343, 0.77700745],\n", " [0.80485914, 0.85919158],\n", " [0.29751565, 0.27228179],\n", " [0.57796668, 0.18255467]],\n", "\n", " [[0.50020698, 0.58134695],\n", " [0.14200095, 0.97556272],\n", " [0.32948647, 0.35170435],\n", " [0.27768833, 0.75059373]],\n", "\n", " [[0.23972627, 0.08461662],\n", " [0.1929383 , 0.80565903],\n", " [0.2627892 , 0.73361884],\n", " [0.18415944, 0.44976198]]])" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# 均匀随机数在[0,1)区间\n", "random.rand(5,4,2)" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[-1.74300737, 1.94689131, 0.18922227, -0.20440928],\n", " [ 1.31664152, -0.01176745, -0.43956951, 0.53571291],\n", " [ 0.02140654, -0.09635041, -1.84205831, 0.64951045],\n", " [ 0.35682903, 0.96657395, -0.50099255, -0.80044681]])" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# 标准正态分布随机数\n", "random.randn(4,4)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### diag" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[1, 0, 0],\n", " [0, 2, 0],\n", " [0, 0, 3]])" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# 一个对角矩阵\n", "np.diag([1,2,3])" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[0, 0, 0, 0],\n", " [1, 0, 0, 0],\n", " [0, 2, 0, 0],\n", " [0, 0, 3, 0]])" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# 从主对角线偏移的对角线\n", "np.diag([1,2,3], k=-1) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### zeros and ones" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[0., 0., 0.],\n", " [0., 0., 0.],\n", " [0., 0., 0.]])" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.zeros((3,3))" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[1., 1., 1.],\n", " [1., 1., 1.],\n", " [1., 1., 1.]])" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.ones((3,3))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 3. 文件 I/O" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 3.1 逗号分隔值 (CSV)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "对于数据文件来说一种非常常见的文件格式是逗号分割值(CSV),或者有关的格式例如TSV(制表符分隔的值)。为了从这些文件中读取数据到Numpy数组中,我们可以用`numpy.genfromtxt`函数。例如:" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1800 1 1 -6.1 -6.1 -6.1 1\r\n", "1800 1 2 -15.4 -15.4 -15.4 1\r\n", "1800 1 3 -15.0 -15.0 -15.0 1\r\n", "1800 1 4 -19.3 -19.3 -19.3 1\r\n", "1800 1 5 -16.8 -16.8 -16.8 1\r\n", "1800 1 6 -11.4 -11.4 -11.4 1\r\n", "1800 1 7 -7.6 -7.6 -7.6 1\r\n", "1800 1 8 -7.1 -7.1 -7.1 1\r\n", "1800 1 9 -10.1 -10.1 -10.1 1\r\n", "1800 1 10 -9.5 -9.5 -9.5 1\r\n" ] } ], "source": [ "!head stockholm_td_adj.dat" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "\n", "data = np.genfromtxt('stockholm_td_adj.dat')" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(77431, 7)" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data.shape" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "%matplotlib inline\n", "import matplotlib.pyplot as plt\n", "\n", "fig, ax = plt.subplots(figsize=(14,4))\n", "ax.plot(data[:,0]+data[:,1]/12.0+data[:,2]/365, data[:,5])\n", "ax.axis('tight')\n", "ax.set_title('tempeatures in Stockholm')\n", "ax.set_xlabel('year')\n", "ax.set_ylabel('temperature (C)');" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "使用`numpy.savetxt`我们可以将一个Numpy数组以CSV格式存入:" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[0.34743109, 0.34666094, 0.67796236],\n", " [0.37775535, 0.7452935 , 0.44639271],\n", " [0.7097024 , 0.54721637, 0.96400871]])" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "M = np.random.rand(3,3)\n", "\n", "M" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [], "source": [ "np.savetxt(\"random-matrix.csv\", M)" ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "3.474310879390657414e-01 3.466609365910759966e-01 6.779623624489031775e-01\r\n", "3.777553531256817587e-01 7.452935047749419395e-01 4.463927097637667707e-01\r\n", "7.097023968559375007e-01 5.472163711854115542e-01 9.640087120207403437e-01\r\n" ] } ], "source": [ "!cat random-matrix.csv" ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0.34743 0.34666 0.67796\r\n", "0.37776 0.74529 0.44639\r\n", "0.70970 0.54722 0.96401\r\n" ] } ], "source": [ "np.savetxt(\"random-matrix.csv\", M, fmt='%.5f') # fmt 确定格式\n", "\n", "!cat random-matrix.csv" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 3.2 numpy 的本地文件格式" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "当存储和读取numpy数组时非常有用。利用函数`numpy.save`和`numpy.load`:" ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "random-matrix.npy: NumPy array, version 1.0, header length 118\r\n" ] } ], "source": [ "np.save(\"random-matrix.npy\", M)\n", "\n", "!file random-matrix.npy" ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[0.34743109, 0.34666094, 0.67796236],\n", " [0.37775535, 0.7452935 , 0.44639271],\n", " [0.7097024 , 0.54721637, 0.96400871]])" ] }, "execution_count": 39, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.load(\"random-matrix.npy\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 4. 更多Numpy数组的性质" ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "int64\n", "8\n" ] } ], "source": [ "M = np.array([[1, 2], [3, 4], [5, 6]])\n", "\n", "print(M.dtype)\n", "print(M.itemsize) # 每个元素的字节数\n" ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "48" ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" } ], "source": [ "M.nbytes # 字节数" ] }, { "cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "2" ] }, "execution_count": 42, "metadata": {}, "output_type": "execute_result" } ], "source": [ "M.ndim # 维度" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 5. 操作数组" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 5.1 索引" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "我们可以用方括号和下标索引元素:" ] }, { "cell_type": "code", "execution_count": 43, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1" ] }, "execution_count": 43, "metadata": {}, "output_type": "execute_result" } ], "source": [ "v = np.array([1, 2, 3, 4, 5])\n", "\n", "# v 是一个向量,仅仅只有一维,取一个索引\n", "v[0]" ] }, { "cell_type": "code", "execution_count": 44, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "4\n", "4\n", "[3 4]\n" ] } ], "source": [ "# M 是一个矩阵或者是一个二维的数组,取两个索引 \n", "print(M[1,1])\n", "print(M[1][1])\n", "print(M[1])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "如果我们省略了一个多维数组的索引,它将会返回整行(或者,总的来说,一个 N-1 维的数组)" ] }, { "cell_type": "code", "execution_count": 45, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[1, 2],\n", " [3, 4],\n", " [5, 6]])" ] }, "execution_count": 45, "metadata": {}, "output_type": "execute_result" } ], "source": [ "M" ] }, { "cell_type": "code", "execution_count": 46, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([3, 4])" ] }, "execution_count": 46, "metadata": {}, "output_type": "execute_result" } ], "source": [ "M[1]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "相同的事情可以利用`:`而不是索引来实现:" ] }, { "cell_type": "code", "execution_count": 47, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([3, 4])" ] }, "execution_count": 47, "metadata": {}, "output_type": "execute_result" } ], "source": [ "M[1,:] # 行 1" ] }, { "cell_type": "code", "execution_count": 48, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([2, 4, 6])" ] }, "execution_count": 48, "metadata": {}, "output_type": "execute_result" } ], "source": [ "M[:,1] # 列 1" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "我们可以用索引赋新的值给数组中的元素:" ] }, { "cell_type": "code", "execution_count": 49, "metadata": {}, "outputs": [], "source": [ "M[0,0] = 1" ] }, { "cell_type": "code", "execution_count": 50, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[1, 2],\n", " [3, 4],\n", " [5, 6]])" ] }, "execution_count": 50, "metadata": {}, "output_type": "execute_result" } ], "source": [ "M" ] }, { "cell_type": "code", "execution_count": 51, "metadata": {}, "outputs": [], "source": [ "# 对行和列也同样有用\n", "M[1,:] = 0\n", "M[:,1] = -1" ] }, { "cell_type": "code", "execution_count": 52, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 1, -1],\n", " [ 0, -1],\n", " [ 5, -1]])" ] }, "execution_count": 52, "metadata": {}, "output_type": "execute_result" } ], "source": [ "M" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 5.2 切片索引" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "切片索引是语法 `M[lower:upper:step]` 的技术名称,用于提取数组的一部分:" ] }, { "cell_type": "code", "execution_count": 53, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([1, 2, 3, 4, 5])" ] }, "execution_count": 53, "metadata": {}, "output_type": "execute_result" } ], "source": [ "A = np.array([1,2,3,4,5])\n", "A" ] }, { "cell_type": "code", "execution_count": 54, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([2, 3])" ] }, "execution_count": 54, "metadata": {}, "output_type": "execute_result" } ], "source": [ "A[1:3]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "切片索引到的数据是 *可变的* : 如果它们被分配了一个新值,那么从其中提取切片的原始数组将被修改:" ] }, { "cell_type": "code", "execution_count": 55, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 1, -2, -3, 4, 5])" ] }, "execution_count": 55, "metadata": {}, "output_type": "execute_result" } ], "source": [ "A[1:3] = [-2,-3] # auto convert type\n", "A[1:3] = np.array([-2, -3]) \n", "\n", "A" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "我们可以省略 `M[lower:upper:step]` 中任意的三个值" ] }, { "cell_type": "code", "execution_count": 56, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 1, -2, -3, 4, 5])" ] }, "execution_count": 56, "metadata": {}, "output_type": "execute_result" } ], "source": [ "A[::] # lower, upper, step 都取默认值" ] }, { "cell_type": "code", "execution_count": 57, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 1, -2, -3, 4, 5])" ] }, "execution_count": 57, "metadata": {}, "output_type": "execute_result" } ], "source": [ "A[:]" ] }, { "cell_type": "code", "execution_count": 58, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 1, -3, 5])" ] }, "execution_count": 58, "metadata": {}, "output_type": "execute_result" } ], "source": [ "A[::2] # step is 2, lower and upper 代表数组的开始和结束" ] }, { "cell_type": "code", "execution_count": 59, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 1, -2, -3])" ] }, "execution_count": 59, "metadata": {}, "output_type": "execute_result" } ], "source": [ "A[:3] # 前3个元素" ] }, { "cell_type": "code", "execution_count": 60, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([4, 5])" ] }, "execution_count": 60, "metadata": {}, "output_type": "execute_result" } ], "source": [ "A[3:] # 从索引3开始的元素" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "负索引计数从数组的结束(正索引从开始):" ] }, { "cell_type": "code", "execution_count": 61, "metadata": {}, "outputs": [], "source": [ "A = np.array([1,2,3,4,5])" ] }, { "cell_type": "code", "execution_count": 62, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "5" ] }, "execution_count": 62, "metadata": {}, "output_type": "execute_result" } ], "source": [ "A[-1] # 数组中最后一个元素" ] }, { "cell_type": "code", "execution_count": 63, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([3, 4, 5])" ] }, "execution_count": 63, "metadata": {}, "output_type": "execute_result" } ], "source": [ "A[-3:] # 最后三个元素" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "索引切片的工作方式与多维数组完全相同:" ] }, { "cell_type": "code", "execution_count": 64, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 0, 1, 2, 3, 4],\n", " [10, 11, 12, 13, 14],\n", " [20, 21, 22, 23, 24],\n", " [30, 31, 32, 33, 34],\n", " [40, 41, 42, 43, 44]])" ] }, "execution_count": 64, "metadata": {}, "output_type": "execute_result" } ], "source": [ "A = np.array([[n+m*10 for n in range(5)] for m in range(5)])\n", "\n", "A" ] }, { "cell_type": "code", "execution_count": 65, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[11, 12, 13],\n", " [21, 22, 23],\n", " [31, 32, 33]])" ] }, "execution_count": 65, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# 原始数组中的一个块\n", "A[1:4, 1:4]" ] }, { "cell_type": "code", "execution_count": 66, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 0, 2, 4],\n", " [20, 22, 24],\n", " [40, 42, 44]])" ] }, "execution_count": 66, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# 步长\n", "A[::2, ::2]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 5.3 花式索引" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Fancy索引是一个名称时,一个数组或列表被使用在一个索引:" ] }, { "cell_type": "code", "execution_count": 67, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[10 11 12 13 14]\n", " [30 31 32 33 34]\n", " [20 21 22 23 24]]\n", "[[ 0 1 2 3 4]\n", " [10 11 12 13 14]\n", " [20 21 22 23 24]\n", " [30 31 32 33 34]\n", " [40 41 42 43 44]]\n" ] } ], "source": [ "A = np.array([[n+m*10 for n in range(5)] for m in range(5)])\n", "\n", "row_indices = [1, 3, 2]\n", "print(A[row_indices])\n", "print(A)" ] }, { "cell_type": "code", "execution_count": 68, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([11, 31, 24])" ] }, "execution_count": 68, "metadata": {}, "output_type": "execute_result" } ], "source": [ "col_indices = [1, 1, -1] # 索引-1 代表最后一个元素\n", "A[row_indices, col_indices]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "我们也可以使用索引掩码:如果索引掩码是一个数据类型`bool`的Numpy数组,那么一个元素被选择(True)或不(False)取决于索引掩码在每个元素位置的值:" ] }, { "cell_type": "code", "execution_count": 69, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([0, 1, 2, 3, 4])" ] }, "execution_count": 69, "metadata": {}, "output_type": "execute_result" } ], "source": [ "B = np.array([n for n in range(5)])\n", "B" ] }, { "cell_type": "code", "execution_count": 70, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([0, 2])" ] }, "execution_count": 70, "metadata": {}, "output_type": "execute_result" } ], "source": [ "row_mask = np.array([True, False, True, False, False])\n", "B[row_mask]" ] }, { "cell_type": "code", "execution_count": 71, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([0, 2])" ] }, "execution_count": 71, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# 相同的事情\n", "row_mask = np.array([1,0,1,0,0], dtype=bool)\n", "B[row_mask]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "这个特性对于有条件地从数组中选择元素非常有用,例如使用比较运算符:" ] }, { "cell_type": "code", "execution_count": 72, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([0. , 0.5, 1. , 1.5, 2. , 2.5, 3. , 3.5, 4. , 4.5, 5. , 5.5, 6. ,\n", " 6.5, 7. , 7.5, 8. , 8.5, 9. , 9.5])" ] }, "execution_count": 72, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x = np.arange(0, 10, 0.5)\n", "x" ] }, { "cell_type": "code", "execution_count": 73, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([False, False, False, False, False, False, False, False, False,\n", " False, False, True, True, True, True, False, False, False,\n", " False, False])" ] }, "execution_count": 73, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mask = (5 < x) * (x < 7.5)\n", "\n", "mask" ] }, { "cell_type": "code", "execution_count": 74, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([5.5, 6. , 6.5, 7. ])" ] }, "execution_count": 74, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x[mask]" ] }, { "cell_type": "code", "execution_count": 75, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([3.5, 4. , 4.5, 5. , 5.5])" ] }, "execution_count": 75, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x[(3\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mA\u001b[0m\u001b[0;34m*\u001b[0m\u001b[0mv1\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;31mValueError\u001b[0m: operands could not be broadcast together with shapes (2,3) (2,) " ] } ], "source": [ "A*v1" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 7.4 矩阵代数" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "矩阵的乘法有两种方法,第一种方法是点乘函数,它对两个参数应用矩阵-矩阵、矩阵-向量或内向量乘法" ] }, { "cell_type": "code", "execution_count": 90, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[2.59833251, 1.8189686 , 1.32946437, 2.15441681, 1.55219543],\n", " [1.4561364 , 1.26875236, 0.97855704, 1.35013248, 1.05524471],\n", " [2.38061437, 1.70445667, 1.16297305, 2.27888345, 1.66499116],\n", " [1.08602725, 0.76015292, 0.46415646, 1.38753125, 1.00011024],\n", " [1.82122991, 1.34175794, 0.92375387, 1.74770416, 1.27559765]])" ] }, "execution_count": 90, "metadata": {}, "output_type": "execute_result" } ], "source": [ "A = np.random.rand(5, 5)\n", "v1 = np.random.rand(5, 1)\n", "\n", "np.dot(A, A)" ] }, { "cell_type": "code", "execution_count": 91, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[2.0139906 ],\n", " [1.41657535],\n", " [2.09784627],\n", " [1.2752073 ],\n", " [1.6253844 ]])" ] }, "execution_count": 91, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.dot(A, v1)\n" ] }, { "cell_type": "code", "execution_count": 92, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[2.08466462]])" ] }, "execution_count": 92, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.dot(v1.T, v1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "另外,我们可以将数组对象投到`matrix`类型上。这将改变标准算术运算符`+, -, *` 的行为,以使用矩阵代数。" ] }, { "cell_type": "code", "execution_count": 93, "metadata": {}, "outputs": [], "source": [ "M = np.matrix(A)\n", "v = np.matrix(v1).T # make it a column vector" ] }, { "cell_type": "code", "execution_count": 94, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "matrix([[0.45282687, 0.64874757, 0.70028245, 0.91412865, 0.36429705]])" ] }, "execution_count": 94, "metadata": {}, "output_type": "execute_result" } ], "source": [ "v" ] }, { "cell_type": "code", "execution_count": 95, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "matrix([[2.59833251, 1.8189686 , 1.32946437, 2.15441681, 1.55219543],\n", " [1.4561364 , 1.26875236, 0.97855704, 1.35013248, 1.05524471],\n", " [2.38061437, 1.70445667, 1.16297305, 2.27888345, 1.66499116],\n", " [1.08602725, 0.76015292, 0.46415646, 1.38753125, 1.00011024],\n", " [1.82122991, 1.34175794, 0.92375387, 1.74770416, 1.27559765]])" ] }, "execution_count": 95, "metadata": {}, "output_type": "execute_result" } ], "source": [ "M * M" ] }, { "cell_type": "code", "execution_count": 96, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "matrix([[2.0139906 ],\n", " [1.41657535],\n", " [2.09784627],\n", " [1.2752073 ],\n", " [1.6253844 ]])" ] }, "execution_count": 96, "metadata": {}, "output_type": "execute_result" } ], "source": [ "M * v.T" ] }, { "cell_type": "code", "execution_count": 97, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "matrix([[2.08466462]])" ] }, "execution_count": 97, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# 內积\n", "v * v.T" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "如果我们尝试用不相配的矩阵形状加,减或者乘我们会得到错误:" ] }, { "cell_type": "code", "execution_count": 98, "metadata": {}, "outputs": [], "source": [ "v = np.matrix([1,2,3,4,5,6]).T" ] }, { "cell_type": "code", "execution_count": 99, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "((5, 5), (6, 1))" ] }, "execution_count": 99, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.shape(M), np.shape(v)" ] }, { "cell_type": "code", "execution_count": 100, "metadata": {}, "outputs": [ { "ename": "ValueError", "evalue": "shapes (5,5) and (6,1) not aligned: 5 (dim 1) != 6 (dim 0)", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mValueError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mM\u001b[0m \u001b[0;34m*\u001b[0m \u001b[0mv\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;32m~/anaconda3/lib/python3.8/site-packages/numpy/matrixlib/defmatrix.py\u001b[0m in \u001b[0;36m__mul__\u001b[0;34m(self, other)\u001b[0m\n\u001b[1;32m 218\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0misinstance\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mother\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m(\u001b[0m\u001b[0mN\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mndarray\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mlist\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mtuple\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 219\u001b[0m \u001b[0;31m# This promotes 1-D vectors to row vectors\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 220\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mN\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mdot\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0masmatrix\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mother\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 221\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0misscalar\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mother\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;32mor\u001b[0m \u001b[0;32mnot\u001b[0m \u001b[0mhasattr\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mother\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'__rmul__'\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 222\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mN\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mdot\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mother\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m<__array_function__ internals>\u001b[0m in \u001b[0;36mdot\u001b[0;34m(*args, **kwargs)\u001b[0m\n", "\u001b[0;31mValueError\u001b[0m: shapes (5,5) and (6,1) not aligned: 5 (dim 1) != 6 (dim 0)" ] } ], "source": [ "M * v" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 7.5 矩阵计算与数据处理" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 求逆" ] }, { "cell_type": "code", "execution_count": 101, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[-2. , 1. ],\n", " [ 1.5, -0.5]])" ] }, "execution_count": 101, "metadata": {}, "output_type": "execute_result" } ], "source": [ "C = np.array([[1, 2], [3, 4]])\n", "np.linalg.inv(C) # equivalent to C.I " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 行列式" ] }, { "cell_type": "code", "execution_count": 102, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "-2.0000000000000004" ] }, "execution_count": 102, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.linalg.det(C)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 数据统计\n", "通常将数据集存储在Numpy数组中是非常有用的。Numpy提供了许多函数用于计算数组中数据集的统计。\n", "\n", "例如,让我们从上面使用的斯德哥尔摩温度数据集计算一些属性。" ] }, { "cell_type": "code", "execution_count": 103, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(77431, 7)" ] }, "execution_count": 103, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import numpy as np\n", "data = np.genfromtxt('stockholm_td_adj.dat')\n", "\n", "# 提醒一下,温度数据集存储在数据变量中:\n", "np.shape(data)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### mean" ] }, { "cell_type": "code", "execution_count": 104, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(77431, 7)\n" ] }, { "data": { "text/plain": [ "6.197109684751585" ] }, "execution_count": 104, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# 温度数据在第三列中\n", "print(data.shape)\n", "np.mean(data[:,3])" ] }, { "cell_type": "code", "execution_count": 105, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.4931528475182218" ] }, "execution_count": 105, "metadata": {}, "output_type": "execute_result" } ], "source": [ "A = np.random.rand(4, 3)\n", "np.mean(A)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "在过去的200年里,斯德哥尔摩每天的平均气温大约是6.2 C。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 标准差和方差" ] }, { "cell_type": "code", "execution_count": 106, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(8.282271621340573, 68.59602320966341)" ] }, "execution_count": 106, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.std(data[:,3]), np.var(data[:,3])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 最小值和最大值" ] }, { "cell_type": "code", "execution_count": 107, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "-25.8" ] }, "execution_count": 107, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# 最低日平均温度\n", "data[:,3].min()" ] }, { "cell_type": "code", "execution_count": 108, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "28.3" ] }, "execution_count": 108, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# 最高日平均温度\n", "data[:,3].max()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### sum, prod, and trace" ] }, { "cell_type": "code", "execution_count": 109, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])" ] }, "execution_count": 109, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d = np.arange(0, 10)\n", "d" ] }, { "cell_type": "code", "execution_count": 110, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "45" ] }, "execution_count": 110, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# 将所有的元素相加\n", "np.sum(d)" ] }, { "cell_type": "code", "execution_count": 111, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3628800" ] }, "execution_count": 111, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# 全元素积分\n", "np.prod(d+1)" ] }, { "cell_type": "code", "execution_count": 112, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 0, 1, 3, 6, 10, 15, 21, 28, 36, 45])" ] }, "execution_count": 112, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# 累计求和\n", "np.cumsum(d)" ] }, { "cell_type": "code", "execution_count": 113, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 1, 2, 6, 24, 120, 720, 5040,\n", " 40320, 362880, 3628800])" ] }, "execution_count": 113, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# 累计乘积\n", "np.cumprod(d+1)" ] }, { "cell_type": "code", "execution_count": 114, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1.4446600641166332" ] }, "execution_count": 114, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# 计算对角线元素的和,和diag(A).sum()一样\n", "np.trace(A)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 7.6 数组子集的计算" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "我们可以使用索引、花式索引和从数组中提取数据的其他方法(如上所述)来计算数组中的数据子集。\n", "\n", "例如,让我们回到温度数据集:" ] }, { "cell_type": "code", "execution_count": 115, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1800 1 1 -6.1 -6.1 -6.1 1\r\n", "1800 1 2 -15.4 -15.4 -15.4 1\r\n", "1800 1 3 -15.0 -15.0 -15.0 1\r\n" ] } ], "source": [ "!head -n 3 stockholm_td_adj.dat" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "数据集的格式是:年,月,日,日平均气温,低,高,位置。\n", "\n", "如果我们对某个特定月份的平均温度感兴趣,比如二月,然后我们可以创建一个索引掩码,使用它来选择当月的数据:" ] }, { "cell_type": "code", "execution_count": 116, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 1., 2., 3., 4., 5., 6., 7., 8., 9., 10., 11., 12.])" ] }, "execution_count": 116, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.unique(data[:,1]) # 列的值从1到12" ] }, { "cell_type": "code", "execution_count": 117, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[False False False ... False False False]\n" ] } ], "source": [ "mask_feb = data[:,1] == 2\n", "print(mask_feb)" ] }, { "cell_type": "code", "execution_count": 118, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "-3.212109570736596\n", "5.090390768766271\n" ] } ], "source": [ "# 温度数据实在第三行\n", "print(np.mean(data[mask_feb,3]))\n", "print(np.std(data[mask_feb,3]))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "有了这些工具,我们就有了非常强大的数据处理能力。例如,提取每年每个月的平均气温只需要几行代码:" ] }, { "cell_type": "code", "execution_count": 119, "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAX4AAAEGCAYAAABiq/5QAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAARgUlEQVR4nO3df7RlZV3H8fdHJgMRRGTEHzheIJKQEG0W/qAMNQpFIVu2EpVISSz8mS5ztFqgfximmLpyqSgIEkJGqOgAgiiwyvwBiAj+CMUBEWJAC1ELA779cfbgdZx753Du2efMuc/7tdZZ9+znnLuf714Mn3nm2Xs/O1WFJKkd95l2AZKkyTL4JakxBr8kNcbgl6TGGPyS1JgV0y5gGDvttFPNzc1NuwxJmimXXXbZrVW1cuP2mQj+ubk5Lr300mmXIUkzJcl1m2p3qkeSGmPwS1JjDH5JaozBL0mNMfglqTEGvyQ1xuCXpMYY/JLUmJm4gUuaBXNr1o59n+uOO3js+5Qc8UtSYwx+SWqMwS9JjTH4JakxBr8kNcbgl6TGGPyS1BiDX5IaY/BLUmMMfklqjMEvSY1xrR5pxox7TSDXA2qPI35JaozBL0mNMfglqTEGvyQ1xuCXpMYY/JLUmN6CP8lJSdYnuWpe27FJvpfkiu71jL76lyRtWp8j/pOBgzbR/vdVtW/3OqfH/iVJm9Bb8FfVJcAP+tq/JGk005jjf1mSK7upoAdOoX9Jatqkg/89wO7AvsBNwPELfTHJUUkuTXLpLbfcMqn6JGnZm2jwV9XNVXVXVd0NvB/Yb5HvnlBVq6tq9cqVKydXpCQtcxMN/iQPnbf5bOCqhb4rSepHb6tzJjkdOADYKckNwDHAAUn2BQpYB7ykr/6lDca9miW4oqVmW2/BX1WHbaL5xL76kyQNxzt3JakxBr8kNcbgl6TGGPyS1BiDX5IaY/BLUmMMfklqjMEvSY0x+CWpMQa/JDXG4Jekxhj8ktQYg1+SGmPwS1JjDH5JaozBL0mNMfglqTEGvyQ1xuCXpMYY/JLUGINfkhpj8EtSYwx+SWqMwS9JjTH4JakxK0b5pSSfrKpnjrsYSVuOuTVrx7q/dccdPNb9aXSjjvhfPNYqJEkTM9SIP8l9gT2BAr5ZVTf1WpUkqTebDf4kBwPvBb4NBNg1yUuq6ty+i5Mkjd8wI/7jgadU1bcAkuwOrAUMfkmaQcPM8a/fEPqda4H1PdUjSerZMCP+q5OcA3yEwRz/HwJfSvIHAFV1Vo/1SZLGbJjg3xq4GfjtbvsWYEfgWQz+IjD4JWmGbDb4q+qFkyhEkjQZw1zVsyvwcmBu/ver6pD+ypIk9WWYqZ6PAScCnwDu7rccSVLfhgn+/62qd/VeiSRpIoYJ/ncmOQY4H7hjQ2NVXd5bVZKk3gwT/L8OHA48lZ9N9VS3LUmaMcME/7OB3arqp30XI0nq3zB37n4F2OHe7jjJSUnWJ7lqXtuOSS5Ick3384H3dr+SpKUZJvh3Br6R5FNJzt7wGuL3TgYO2qhtDXBhVe0BXNhtS5ImaJipnmNG2XFVXZJkbqPmQ4EDuvenABcBrxtl/5Kk0Qxz5+7FSR4J7FFVn05yP2CrEfvbecNa/lV1U5IHL/TFJEcBRwGsWrVqxO4kSRvb7FRPkhcDZwLv65oezuCmrl5V1QlVtbqqVq9cubLv7iSpGcPM8b8U2B/4IUBVXQMsOFLfjJuTPBSg++nyzpI0YcME/x3zL+VMsoLBdfyjOBs4ont/BPDxEfcjSRrRMCd3L07yBmCbJAcCRzNYt2dRSU5ncCJ3pyQ3MDhJfBzwkSRHAtczWNtfjZpbs3bs+1x33MFj36e03AwT/GuAI4GvAi8Bzqmq92/ul6rqsAU+etrw5UmSxm2Y4H95Vb0TuCfsk7yya5MkzZhh5viP2ETbn4y5DknShCw44k9yGPA8YNeN7tTdDvh+34VJkvqx2FTP54CbgJ2A4+e13w5c2WdRkqT+LBj8VXUdcB3wxMmVI0nq2zBz/JKkZcTgl6TGGPyS1JiRgj/JsWOuQ5I0IaOO+C8baxWSpIkZKfirarNr9UiStkybXbIhybs20XwbcGlVubqmJM2YYUb8WwP7Atd0r32AHYEjk7yjx9okST0YZpG2XwGeWlV3AiR5D3A+cCCDFTslSTNkmBH/w4Ft521vCzysqu4C7uilKklSb4YZ8f8dcEWSi4AATwbenGRb4NM91iZJ6sFmg7+qTkxyDrAfg+B/Q1Xd2H382j6LkySN3zBX9ZwNnA6cXVU/7r8kSVKfhpnjPx74LeBrSf45yXOSbN1zXZKkngwz1XMxgweubwU8FXgxcBKwfc+1SZJ6MMzJXZJsAzwL+CPgccApfRYlSerPMHP8/wQ8HjgPeDdwUVXd3XdhkqR+DDPi/yDwvO66fUnSjBtmjv+8JHsn2YvB8g0b2j/Ua2WSpF4MM9VzDHAAsBdwDvB04F8Bg1+SZtAwUz3PAR4DfLmqXphkZ+AD/ZYlqQVza9aOfZ/rjjt47Ptcboa5jv9/upO5dybZHlgP7NZvWZKkvgwz4r80yQ7A+xk8eetHwBd7rUqS1JthTu4e3b19b5LzgO2r6sp+y5Ik9WWoG7g2qKp1PdUhSZqQUR+2LkmaUQa/JDVms8Gf5G1JHj2JYiRJ/RtmxP8N4IQkX0jyZ0ke0HdRkqT+bDb4q+oDVbU/8MfAHHBlkg8neUrfxUmSxm+oOf5uLf49u9etwFeAVyc5o8faJEk9GGatnrcDhwAXAm+uqg03b70lyTf7LE6SNH7DXMd/FfDXVfWTTXy235jrkST1bMHgT/K47u0VwJ5Jfu7zqrq8qm7rsTZJUg8WG/Efv8hnxeD5uyNJsg64HbgLuLOqVo+6L0nSvbNg8FdV31ftPKWqbu25D0nSRoZ92PqTGFzKec/3fQKXJM2mYa7qORXYncFc/4bn7hZLewJXAecnKeB9VXXCJvo9CjgKYNWqVUvoSpI03zAj/tXAXlVVY+x3/6q6McmDgQuSfKOqLpn/he4vgxMAVq9ePc6+Jalpw9zAdRXwkHF2WlU3dj/XAx/Fy0IlaWIWu5zzEwymZLYDvpbki8AdGz6vqkNG6TDJtsB9qur27v3vAm8aZV+SpHtvsamet/XU587AR7v7AlYAH66q83rqS5K0kcUu57wYIMlbqup18z9L8hbg4lE6rKprgceM8ruSpKUbZo7/wE20PX3chUiSJmOxOf4/B44Gdksy/+Hq2wGf67swSVI/Fpvj/zBwLvC3wJp57bdX1Q96rUqS1JvF5vhvA24DDuvW49+5+/79k9y/qq6fUI2SpDEa5s7dlwHHAjcDd3fNBezTX1mSpL4Mc+fuq4BHVdX3+y5GW465NWvHur91xx081v1JGt0wV/V8l8GUjyRpGRhmxH8tcFGStfz8nbtv760qSVJvhgn+67vXfbuXJGmGbTb4q+qNAEm2G2zWj3qvSpLUm83O8SfZO8mXGazSeXWSy5I8uv/SJEl9GObk7gnAq6vqkVX1SOA1wPv7LUuS1Jdhgn/bqvrsho2qugjYtreKJEm9GuqqniR/A5zabb8A+E5/JUmS+jTMiP9FwErgLAZPy1oJvLDPoiRJ/Rnmqp7/Al4xgVokSROw2LLMZy/2i6M+elGSNF2LjfifyGC5htOBLwCZSEWSpF4tFvwPYfD0rcOA5wFrgdOr6upJFCZJ6seCJ3er6q6qOq+qjgCeAHyLwZo9L59YdZKksVv05G6SXwYOZjDqnwPexeDqHknSjFrs5O4pwN4MHr/4xqq6amJVSZJ6s9iI/3Dgx8CvAq9I7jm3GwaLtW3fc22SpB4s9szdYW7ukqQt3rifKAez/VQ5w12SGmPwS1JjDH5JaozBL0mNMfglqTEGvyQ1xuCXpMYY/JLUGINfkhpj8EtSYwx+SWqMwS9JjTH4JakxBr8kNcbgl6TGTCX4kxyU5JtJvpVkzTRqkKRWTTz4k2wFvBt4OrAXcFiSvSZdhyS1ahoj/v2Ab1XVtVX1U+AM4NAp1CFJTUpVTbbD5DnAQVX1p9324cDjq+plG33vKOAogFWrVv3GddddN1J/k3rk2qz2M8uPj5O2NFvaIx6TXFZVqzdun8aIP5to+4W/farqhKpaXVWrV65cOYGyJKkN0wj+G4BHzNveBbhxCnVIUpOmEfxfAvZIsmuS+wLPBc6eQh2S1KQVk+6wqu5M8jLgU8BWwElVdfWk65CkVk08+AGq6hzgnGn0LUmt885dSWqMwS9JjZnKVI9G53X3kpbKEb8kNcbgl6TGGPyS1Bjn+CVpTGblHJwjfklqjMEvSY0x+CWpMQa/JDXG4Jekxhj8ktQYg1+SGmPwS1JjDH5JaozBL0mNMfglqTEGvyQ1xuCXpMYY/JLUGINfkhpj8EtSY5b9g1hm5cEIkjQpjvglqTHLfsQ/Kf7LQtKscMQvSY0x+CWpMQa/JDXG4Jekxhj8ktQYg1+SGmPwS1JjDH5JaozBL0mNSVVNu4bNSnILcN206xiTnYBbp13EGC2n41lOxwIez5ZsUsfyyKpauXHjTAT/cpLk0qpaPe06xmU5Hc9yOhbweLZk0z4Wp3okqTEGvyQ1xuCfvBOmXcCYLafjWU7HAh7Plmyqx+IcvyQ1xhG/JDXG4Jekxhj8E5LkEUk+m+TrSa5O8spp17RUSbZK8uUkn5x2LUuVZIckZyb5Rvff6InTrmlUSf6i+zN2VZLTk2w97ZrujSQnJVmf5Kp5bTsmuSDJNd3PB06zxntjgeN5a/dn7cokH02ywyRrMvgn507gNVX1a8ATgJcm2WvKNS3VK4GvT7uIMXkncF5V7Qk8hhk9riQPB14BrK6qvYGtgOdOt6p77WTgoI3a1gAXVtUewIXd9qw4mV88nguAvatqH+A/gNdPsiCDf0Kq6qaqurx7fzuDYHn4dKsaXZJdgIOBD0y7lqVKsj3wZOBEgKr6aVX993SrWpIVwDZJVgD3A26ccj33SlVdAvxgo+ZDgVO696cAvz/RopZgU8dTVedX1Z3d5ueBXSZZk8E/BUnmgMcCX5huJUvyDuAvgbunXcgY7AbcAnywm7r6QJJtp13UKKrqe8DbgOuBm4Dbqur86VY1FjtX1U0wGEQBD55yPeP0IuDcSXZo8E9YkvsD/wK8qqp+OO16RpHkmcD6qrps2rWMyQrgccB7quqxwI+ZramEe3Rz34cCuwIPA7ZN8oLpVqWFJPkrBtPAp02yX4N/gpL8EoPQP62qzpp2PUuwP3BIknXAGcBTk/zjdEtakhuAG6pqw7/AzmTwF8Es+h3gO1V1S1X9H3AW8KQp1zQONyd5KED3c/2U61myJEcAzwSeXxO+ocrgn5AkYTCH/PWqevu061mKqnp9Ve1SVXMMThx+pqpmdlRZVf8JfDfJo7qmpwFfm2JJS3E98IQk9+v+zD2NGT1RvZGzgSO690cAH59iLUuW5CDgdcAhVfWTSfdv8E/O/sDhDEbHV3SvZ0y7KN3j5cBpSa4E9gXePOV6RtL9q+VM4HLgqwz+H5+ppQ6SnA78O/CoJDckORI4DjgwyTXAgd32TFjgeP4B2A64oMuC9060JpdskKS2OOKXpMYY/JLUGINfkhpj8EtSYwx+SWqMwS8BSSrJqfO2VyS5ZdSVR7vVPo+et33AcljFVMuDwS8N/BjYO8k23faBwPeWsL8dgKM3+y1pCgx+6WfOZbDiKMBhwOkbPujWg/9Yt37655Ps07Uf2623flGSa5O8ovuV44Ddu5tz3tq13X/emv+ndXfWShNn8Es/cwbw3O7BJfvw86unvhH4crd++huAD837bE/g94D9gGO6NZnWAN+uqn2r6rXd9x4LvArYi8GKoPv3eTDSQgx+qVNVVwJzDEb752z08W8Cp3bf+wzwoCQP6D5bW1V3VNWtDBYP23mBLr5YVTdU1d3AFV1f0sStmHYB0hbmbAbr2R8APGhe+6amZTasd3LHvLa7WPj/q2G/J/XKEb/0804C3lRVX92o/RLg+TC4Qge4dTPPU7idwSJc0hbHEYc0T1XdwOD5uxs7lsETuq4EfsLPlgheaD/fT/Jv3QO2zwXWjrtWaVSuzilJjXGqR5IaY/BLUmMMfklqjMEvSY0x+CWpMQa/JDXG4Jekxvw/tYNNp2EnXcsAAAAASUVORK5CYII=\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "%matplotlib inline\n", "import matplotlib.pyplot as plt\n", "\n", "months = np.unique(data[:,1])\n", "monthly_mean = [np.mean(data[data[:,1] == month, 3]) for month in months]\n", "\n", "fig, ax = plt.subplots()\n", "ax.bar(months, monthly_mean)\n", "ax.set_xlabel(\"Month\")\n", "ax.set_ylabel(\"Monthly avg. temp.\");" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 7.7 高维数据的计算" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "当例如`min`, `max`等函数应用在高维数组上时,有时将计算应用于整个数组是有用的,而且很多时候有时只基于行或列。用`axis`参数我们可以决定这个函数应该怎样表现:" ] }, { "cell_type": "code", "execution_count": 120, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[0.85882078, 0.0838741 , 0.4529751 ],\n", " [0.32355282, 0.23641565, 0.37693805],\n", " [0.06769945, 0.30438005, 0.9780961 ],\n", " [0.46162058, 0.42681981, 0.71106984]])" ] }, "execution_count": 120, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import numpy as np\n", "\n", "m = np.random.rand(4,3)\n", "m" ] }, { "cell_type": "code", "execution_count": 121, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.978096099540799" ] }, "execution_count": 121, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# global max\n", "m.max()" ] }, { "cell_type": "code", "execution_count": 122, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([0.85882078, 0.42681981, 0.9780961 ])" ] }, "execution_count": 122, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# max in each column\n", "m.max(axis=0)" ] }, { "cell_type": "code", "execution_count": 123, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([0.85882078, 0.37693805, 0.9780961 , 0.71106984])" ] }, "execution_count": 123, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# max in each row\n", "m.max(axis=1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "许多其他的在`array` 和`matrix`类中的函数和方法接受同样(可选的)的关键字参数`axis`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 8. 阵列的重塑、调整大小和堆叠" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Numpy数组的形状可以被确定而无需复制底层数据,这使得即使对于大型数组也能有较快的操作。" ] }, { "cell_type": "code", "execution_count": 124, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[0.58458652 0.95489874 0.76873658]\n", " [0.79144906 0.35559767 0.96031963]\n", " [0.55942317 0.78723157 0.3650356 ]\n", " [0.04685468 0.43444695 0.33839966]]\n" ] } ], "source": [ "import numpy as np\n", "\n", "A = np.random.rand(4, 3)\n", "print(A)" ] }, { "cell_type": "code", "execution_count": 125, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "4 3\n" ] } ], "source": [ "n, m = A.shape\n", "print(n, m)" ] }, { "cell_type": "code", "execution_count": 126, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[0.58458652, 0.95489874, 0.76873658, 0.79144906, 0.35559767,\n", " 0.96031963, 0.55942317, 0.78723157, 0.3650356 , 0.04685468,\n", " 0.43444695, 0.33839966]])" ] }, "execution_count": 126, "metadata": {}, "output_type": "execute_result" } ], "source": [ "B = A.reshape((1,n*m))\n", "B" ] }, { "cell_type": "code", "execution_count": 127, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[0.58458652]\n", " [0.95489874]\n", " [0.76873658]\n", " [0.79144906]\n", " [0.35559767]\n", " [0.96031963]\n", " [0.55942317]\n", " [0.78723157]\n", " [0.3650356 ]\n", " [0.04685468]\n", " [0.43444695]\n", " [0.33839966]]\n", "(12, 1)\n" ] } ], "source": [ "B2 = A.reshape((n*m, 1))\n", "print(B2)\n", "print(B2.shape)" ] }, { "cell_type": "code", "execution_count": 128, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[5. , 5. , 5. , 5. , 5. ,\n", " 0.96031963, 0.55942317, 0.78723157, 0.3650356 , 0.04685468,\n", " 0.43444695, 0.33839966]])" ] }, "execution_count": 128, "metadata": {}, "output_type": "execute_result" } ], "source": [ "B[0,0:5] = 5 # modify the array\n", "\n", "B" ] }, { "cell_type": "code", "execution_count": 129, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[5. , 5. , 5. ],\n", " [5. , 5. , 0.96031963],\n", " [0.55942317, 0.78723157, 0.3650356 ],\n", " [0.04685468, 0.43444695, 0.33839966]])" ] }, "execution_count": 129, "metadata": {}, "output_type": "execute_result" } ], "source": [ "A # and the original variable is also changed. B is only a different view of the same data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also use the function `flatten` to make a higher-dimensional array into a vector. But this function create a copy of the data." ] }, { "cell_type": "code", "execution_count": 130, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([5. , 5. , 5. , 5. , 5. ,\n", " 0.96031963, 0.55942317, 0.78723157, 0.3650356 , 0.04685468,\n", " 0.43444695, 0.33839966])" ] }, "execution_count": 130, "metadata": {}, "output_type": "execute_result" } ], "source": [ "B = A.flatten()\n", "\n", "B" ] }, { "cell_type": "code", "execution_count": 131, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(12,)\n" ] } ], "source": [ "print(B.shape)" ] }, { "cell_type": "code", "execution_count": 132, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[0.88616566 0.11474399 0.49426839 0.86496944 0.44553257 0.01731081\n", " 0.26391484 0.81714822 0.9077824 0.45350327 0.34418481 0.30680307\n", " 0.22397584 0.96490185 0.25766897 0.1628303 0.35022665 0.87266285\n", " 0.14436895 0.2987234 0.04567582 0.62524215 0.03006832 0.15222984\n", " 0.86554462 0.30036796 0.66637188 0.51245662 0.46296801 0.53384373\n", " 0.90012971 0.00319531 0.48428543 0.24703543 0.53384405 0.48024175\n", " 0.17175873 0.1834814 0.43739033 0.64565657 0.49266811 0.72123815\n", " 0.57728476 0.76663343 0.68360823 0.34881945 0.64329004 0.79011718\n", " 0.7055079 0.32594224 0.48795517 0.43684614 0.32047664 0.63067622\n", " 0.24496431 0.25019593 0.57181523 0.38889906 0.53574819 0.02653888]\n" ] } ], "source": [ "T = np.random.rand(3, 4, 5)\n", "T2 = T.flatten()\n", "print(T2)" ] }, { "cell_type": "code", "execution_count": 133, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([10. , 10. , 10. , 10. , 10. ,\n", " 0.96031963, 0.55942317, 0.78723157, 0.3650356 , 0.04685468,\n", " 0.43444695, 0.33839966])" ] }, "execution_count": 133, "metadata": {}, "output_type": "execute_result" } ], "source": [ "B[0:5] = 10\n", "\n", "B" ] }, { "cell_type": "code", "execution_count": 134, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[5. , 5. , 5. ],\n", " [5. , 5. , 0.96031963],\n", " [0.55942317, 0.78723157, 0.3650356 ],\n", " [0.04685468, 0.43444695, 0.33839966]])" ] }, "execution_count": 134, "metadata": {}, "output_type": "execute_result" } ], "source": [ "A # 现在A并没有改变,因为B的数值是A的复制,并不指向同样的值。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 9. 添加、删除维度:newaxis、squeeze" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "当矩阵乘法的时候,需要两个矩阵的对应的纬度保持一致才可以正确执行,有了`newaxis`,我们可以在数组中插入新的维度,例如将一个向量转换为列或行矩阵:" ] }, { "cell_type": "code", "execution_count": 135, "metadata": {}, "outputs": [], "source": [ "v = np.array([1,2,3])" ] }, { "cell_type": "code", "execution_count": 136, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(3,)\n", "[1 2 3]\n" ] } ], "source": [ "print(np.shape(v))\n", "print(v)" ] }, { "cell_type": "code", "execution_count": 137, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(3, 1)\n", "[[1]\n", " [2]\n", " [3]]\n" ] } ], "source": [ "v2 = v.reshape(3, 1)\n", "print(v2.shape)\n", "print(v2)" ] }, { "cell_type": "code", "execution_count": 138, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(3,)\n", "(3, 1)\n" ] } ], "source": [ "# 做一个向量v的列矩阵\n", "v2 = v[:, np.newaxis]\n", "print(v.shape)\n", "print(v2.shape)\n" ] }, { "cell_type": "code", "execution_count": 139, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(3, 1)" ] }, "execution_count": 139, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# 列矩阵\n", "v[:,np.newaxis].shape" ] }, { "cell_type": "code", "execution_count": 140, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(1, 3)" ] }, "execution_count": 140, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# 行矩阵\n", "v[np.newaxis,:].shape" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "也可以通过 `np.expand_dims` 来实现类似的操作" ] }, { "cell_type": "code", "execution_count": 141, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(3, 1)\n", "[[1]\n", " [2]\n", " [3]]\n" ] } ], "source": [ "v = np.array([1,2,3])\n", "v3 = np.expand_dims(v, 1)\n", "print(v3.shape)\n", "print(v3)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "在某些情况,需要将纬度为1的那个纬度删除掉,可以使用`np.squeeze`实现" ] }, { "cell_type": "code", "execution_count": 142, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(1, 2, 3)\n", "[[[1 2 3]\n", " [2 3 4]]]\n" ] } ], "source": [ "arr = np.array([[[1, 2, 3], [2, 3, 4]]])\n", "print(arr.shape)\n", "print(arr)" ] }, { "cell_type": "code", "execution_count": 143, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(2, 3)\n", "[[1 2 3]\n", " [2 3 4]]\n" ] } ], "source": [ "# 实际上第一个纬度为`1`,我们不需要\n", "arr2 = np.squeeze(arr, 0)\n", "print(arr2.shape)\n", "print(arr2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "需要注意:只有数组长度在该纬度上为1,那么该纬度才可以被删除;否则会报错。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 10. 叠加和重复数组" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "利用函数`repeat`, `tile`, `vstack`, `hstack`, 和`concatenate` 可以用较小的向量和矩阵来创建更大的向量和矩阵。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 10.1 tile and repeat" ] }, { "cell_type": "code", "execution_count": 144, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[1 2]\n", " [3 4]]\n" ] } ], "source": [ "a = np.array([[1, 2], [3, 4]])\n", "print(a)" ] }, { "cell_type": "code", "execution_count": 145, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4])" ] }, "execution_count": 145, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# 重复每一个元素三次\n", "np.repeat(a, 3)" ] }, { "cell_type": "code", "execution_count": 146, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[1, 2, 1, 2, 1, 2],\n", " [3, 4, 3, 4, 3, 4]])" ] }, "execution_count": 146, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# tile the matrix 3 times \n", "np.tile(a, 3)" ] }, { "cell_type": "code", "execution_count": 147, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[1, 2, 1, 2, 1, 2],\n", " [3, 4, 3, 4, 3, 4]])" ] }, "execution_count": 147, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# 更好的方案\n", "np.tile(a, (1, 3))" ] }, { "cell_type": "code", "execution_count": 148, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[1, 2],\n", " [3, 4],\n", " [1, 2],\n", " [3, 4],\n", " [1, 2],\n", " [3, 4]])" ] }, "execution_count": 148, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.tile(a, (3, 1))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 10.2 concatenate" ] }, { "cell_type": "code", "execution_count": 149, "metadata": {}, "outputs": [], "source": [ "b = np.array([[5, 6]])" ] }, { "cell_type": "code", "execution_count": 150, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[1, 2],\n", " [3, 4],\n", " [5, 6]])" ] }, "execution_count": 150, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.concatenate((a, b), axis=0)" ] }, { "cell_type": "code", "execution_count": 151, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[1, 2, 5],\n", " [3, 4, 6]])" ] }, "execution_count": 151, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.concatenate((a, b.T), axis=1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 10.3 hstack and vstack" ] }, { "cell_type": "code", "execution_count": 152, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[1, 2],\n", " [3, 4],\n", " [5, 6]])" ] }, "execution_count": 152, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.vstack((a,b))" ] }, { "cell_type": "code", "execution_count": 153, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[1, 2, 5],\n", " [3, 4, 6]])" ] }, "execution_count": 153, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.hstack((a,b.T))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 11. 复制和“深度复制”" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "为了获得高性能,Python中的赋值通常不复制底层对象。例如,在函数之间传递对象时,通过引用传递从而避免不必要的大量内存复制。" ] }, { "cell_type": "code", "execution_count": 154, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[1, 2],\n", " [3, 4]])" ] }, "execution_count": 154, "metadata": {}, "output_type": "execute_result" } ], "source": [ "A = np.array([[1, 2], [3, 4]])\n", "\n", "A" ] }, { "cell_type": "code", "execution_count": 155, "metadata": {}, "outputs": [], "source": [ "# 现在B和A指的是同一个数组数据\n", "B = A " ] }, { "cell_type": "code", "execution_count": 156, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[10, 2],\n", " [ 3, 4]])" ] }, "execution_count": 156, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# 改变B影响A\n", "B[0,0] = 10\n", "\n", "B" ] }, { "cell_type": "code", "execution_count": 157, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[10, 2],\n", " [ 3, 4]])" ] }, "execution_count": 157, "metadata": {}, "output_type": "execute_result" } ], "source": [ "A" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "如果我们想避免这种引用赋值的行为,那么当我们从 `A` 复制一个新的完全独立的对象 `B` 时,我们需要使用函数 `copy` 来做一个所谓的“深度复制”:" ] }, { "cell_type": "code", "execution_count": 158, "metadata": {}, "outputs": [], "source": [ "B = np.copy(A)" ] }, { "cell_type": "code", "execution_count": 159, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[-5, 2],\n", " [ 3, 4]])" ] }, "execution_count": 159, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# 现在如果我们改变B,A不受影响\n", "B[0,0] = -5\n", "\n", "B" ] }, { "cell_type": "code", "execution_count": 160, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[10, 2],\n", " [ 3, 4]])" ] }, "execution_count": 160, "metadata": {}, "output_type": "execute_result" } ], "source": [ "A" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 12. 遍历数组元素" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "通常,我们希望尽可能避免遍历数组元素(不惜一切代价)。原因是在像Python(或MATLAB)这样的解释语言中,迭代与向量化操作相比真的很慢。\n", "\n", "然而,有时迭代是不可避免的。对于这种情况,Python的For循环是最方便的遍历数组的方法:" ] }, { "cell_type": "code", "execution_count": 161, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1\n", "2\n", "3\n", "4\n" ] } ], "source": [ "v = np.array([1,2,3,4])\n", "\n", "for element in v:\n", " print(element)" ] }, { "cell_type": "code", "execution_count": 162, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "row [1 2]\n", "1\n", "2\n", "row [3 4]\n", "3\n", "4\n" ] } ], "source": [ "M = np.array([[1,2], [3,4]])\n", "\n", "for row in M:\n", " print(\"row\", row)\n", " \n", " for element in row:\n", " print(element)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "当我们需要去\n", "当我们需要遍历一个数组的每个元素并修改它的元素时,使用`enumerate`函数可以方便地在`for`循环中获得元素及其索引:" ] }, { "cell_type": "code", "execution_count": 163, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "row_idx 0 row [1 2]\n", "col_idx 0 element 1\n", "col_idx 1 element 2\n", "row_idx 1 row [3 4]\n", "col_idx 0 element 3\n", "col_idx 1 element 4\n" ] } ], "source": [ "for row_idx, row in enumerate(M):\n", " print(\"row_idx\", row_idx, \"row\", row)\n", " \n", " for col_idx, element in enumerate(row):\n", " print(\"col_idx\", col_idx, \"element\", element)\n", " \n", " # 更新矩阵:对每个元素求平方\n", " M[row_idx, col_idx] = element ** 2" ] }, { "cell_type": "code", "execution_count": 164, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 1, 4],\n", " [ 9, 16]])" ] }, "execution_count": 164, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# 现在矩阵里的每一个元素都已经求得平方\n", "M" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 13. 向量化功能" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "正如前面多次提到的,为了获得良好的性能,我们应该尽量避免对向量和矩阵中的元素进行循环,而应该使用向量化算法。将标量算法转换为向量化算法的第一步是确保我们编写的函数使用向量输入。" ] }, { "cell_type": "code", "execution_count": 165, "metadata": {}, "outputs": [], "source": [ "def Theta(x):\n", " \"\"\"\n", " 阶跃函数的普遍版本\n", " \"\"\"\n", " if x >= 0:\n", " return 1\n", " else:\n", " return 0" ] }, { "cell_type": "code", "execution_count": 166, "metadata": { "scrolled": true }, "outputs": [ { "ename": "ValueError", "evalue": "The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mValueError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mTheta\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mnp\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0marray\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m-\u001b[0m\u001b[0;36m3\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m-\u001b[0m\u001b[0;36m2\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m-\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;36m0\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;36m2\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;36m3\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;32m\u001b[0m in \u001b[0;36mTheta\u001b[0;34m(x)\u001b[0m\n\u001b[1;32m 3\u001b[0m \u001b[0m阶跃函数的普遍版本\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 4\u001b[0m \"\"\"\n\u001b[0;32m----> 5\u001b[0;31m \u001b[0;32mif\u001b[0m \u001b[0mx\u001b[0m \u001b[0;34m>=\u001b[0m \u001b[0;36m0\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 6\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0;36m1\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 7\u001b[0m \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;31mValueError\u001b[0m: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()" ] } ], "source": [ "Theta(np.array([-3,-2,-1,0,1,2,3]))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "这个操作并不可行,因为所实现的 `Theta` 函数不能接收向量输入。\n", "\n", "为了得到向量化的版本,我们可以使用Numpy函数 `vectorize` 。在许多情况下,它可以自动向量化一个函数:" ] }, { "cell_type": "code", "execution_count": 167, "metadata": {}, "outputs": [], "source": [ "Theta_vec = np.vectorize(Theta)" ] }, { "cell_type": "code", "execution_count": 168, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([0, 0, 0, 1, 1, 1, 1])" ] }, "execution_count": 168, "metadata": {}, "output_type": "execute_result" } ], "source": [ "Theta_vec(np.array([-3,-2,-1,0,1,2,3]))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "我们也可以实现从一开始就接受矢量输入的函数(需要更多的计算,但可能会有更好的性能):" ] }, { "cell_type": "code", "execution_count": 169, "metadata": {}, "outputs": [], "source": [ "def Theta(x):\n", " \"\"\"\n", " Heaviside阶跃函数的矢量感知实现。\n", " \"\"\"\n", " return 1 * (x >= 0)" ] }, { "cell_type": "code", "execution_count": 170, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([0, 0, 0, 1, 1, 1, 1])" ] }, "execution_count": 170, "metadata": {}, "output_type": "execute_result" } ], "source": [ "Theta(np.array([-3,-2,-1,0,1,2,3]))" ] }, { "cell_type": "code", "execution_count": 171, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[False False False True True True True]\n" ] }, { "data": { "text/plain": [ "array([0, 0, 0, 1, 1, 1, 1])" ] }, "execution_count": 171, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a = np.array([-3,-2,-1,0,1,2,3])\n", "b = a>=0\n", "print(b)\n", "b*1" ] }, { "cell_type": "code", "execution_count": 172, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(0, 1)" ] }, "execution_count": 172, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# 同样适用于标量\n", "Theta(-1.2), Theta(2.6)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 14. 在条件中使用数组" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "当在条件中使用数组时,例如`if`语句和其他布尔表达,一个需要用`any`或者`all`,这让数组任何或者所有元素都等于`True`。" ] }, { "cell_type": "code", "execution_count": 173, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[1, 2],\n", " [3, 4]])" ] }, "execution_count": 173, "metadata": {}, "output_type": "execute_result" } ], "source": [ "M = np.array([[1, 2], [3, 4]])\n", "M" ] }, { "cell_type": "code", "execution_count": 174, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 174, "metadata": {}, "output_type": "execute_result" } ], "source": [ "(M > 2).any()" ] }, { "cell_type": "code", "execution_count": 175, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "at least one element in M is larger than 2\n" ] } ], "source": [ "if (M > 2).any():\n", " print(\"at least one element in M is larger than 2\")\n", "else:\n", " print(\"no element in M is larger than 2\")" ] }, { "cell_type": "code", "execution_count": 176, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "all elements in M are not larger than 5\n" ] } ], "source": [ "if (M > 5).all():\n", " print(\"all elements in M are larger than 5\")\n", "else:\n", " print(\"all elements in M are not larger than 5\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 15. 类型转换" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "因为Numpy数组是*静态类型*,数组的类型一旦创建就不会改变。但是我们可以用`astype`函数(参见类似的“asarray”函数)显式地转换一个数组的类型到其他的类型,这总是创建一个新类型的新数组。" ] }, { "cell_type": "code", "execution_count": 177, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dtype('int64')" ] }, "execution_count": 177, "metadata": {}, "output_type": "execute_result" } ], "source": [ "M.dtype\n" ] }, { "cell_type": "code", "execution_count": 178, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[1., 2.],\n", " [3., 4.]])" ] }, "execution_count": 178, "metadata": {}, "output_type": "execute_result" } ], "source": [ "M2 = M.astype(float)\n", "\n", "M2" ] }, { "cell_type": "code", "execution_count": 179, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dtype('float64')" ] }, "execution_count": 179, "metadata": {}, "output_type": "execute_result" } ], "source": [ "M2.dtype" ] }, { "cell_type": "code", "execution_count": 180, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ True, True],\n", " [ True, True]])" ] }, "execution_count": 180, "metadata": {}, "output_type": "execute_result" } ], "source": [ "M3 = M.astype(bool)\n", "\n", "M3" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 16. 进一步学习" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* [NumPy 简易教程](https://www.runoob.com/numpy/numpy-tutorial.html)\n", "* [NumPy 官方用户指南](https://www.numpy.org.cn/user/)\n", "* [NumPy 官方参考手册](https://www.numpy.org.cn/reference/)\n", "* [一个针对MATLAB使用者的Numpy教程](https://numpy.org/doc/stable/user/numpy-for-matlab-users.html)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.3" } }, "nbformat": 4, "nbformat_minor": 1 }