|
|
@@ -0,0 +1,536 @@ |
|
|
|
{ |
|
|
|
"nbformat": 4, |
|
|
|
"nbformat_minor": 0, |
|
|
|
"metadata": { |
|
|
|
"colab": { |
|
|
|
"name": "SHARE MLSpring2021 - HW2-2.ipynb", |
|
|
|
"provenance": [], |
|
|
|
"collapsed_sections": [] |
|
|
|
}, |
|
|
|
"kernelspec": { |
|
|
|
"name": "python3", |
|
|
|
"display_name": "Python 3" |
|
|
|
} |
|
|
|
}, |
|
|
|
"cells": [ |
|
|
|
{ |
|
|
|
"cell_type": "markdown", |
|
|
|
"metadata": { |
|
|
|
"id": "eNSV4QGHS1I1" |
|
|
|
}, |
|
|
|
"source": [ |
|
|
|
"# **Homework 2-2 Hessian Matrix**\r\n", |
|
|
|
"\r\n" |
|
|
|
] |
|
|
|
}, |
|
|
|
{ |
|
|
|
"cell_type": "markdown", |
|
|
|
"metadata": { |
|
|
|
"id": "z0eNH3RD73Ye" |
|
|
|
}, |
|
|
|
"source": [ |
|
|
|
"## Hessian Matrix\n", |
|
|
|
"Imagine we are training a neural network and we are trying to find out whether the model is at **local minima like, saddle point, or none of the above**. We can make our decision by calculating the Hessian matrix.\n", |
|
|
|
"\n", |
|
|
|
"In practice, it is really hard to find a point where the gradient equals zero or all of the eigenvalues in Hessian matrix are greater than zero. In this homework, we make the following two assumptions:\n", |
|
|
|
"1. View gradient norm less than 1e-3 as **gradient equals to zero**.\n", |
|
|
|
"2. If minimum ratio is greater than 0.5 and gradient norm is less than 1e-3, then we assume that the model is at “local minima like”.\n", |
|
|
|
"\n", |
|
|
|
"> Minimum ratio is defined as the proportion of positive eigenvalues." |
|
|
|
] |
|
|
|
}, |
|
|
|
{ |
|
|
|
"cell_type": "markdown", |
|
|
|
"metadata": { |
|
|
|
"id": "lezCgM8U8KJl" |
|
|
|
}, |
|
|
|
"source": [ |
|
|
|
"## IMPORTANT NOTICE\n", |
|
|
|
"In this homework, students with different student IDs will get different answers. Make sure to fill in your `student_id` in the following block correctly. Otherwise, your code may not run correctly and you will get a wrong answer." |
|
|
|
] |
|
|
|
}, |
|
|
|
{ |
|
|
|
"cell_type": "code", |
|
|
|
"metadata": { |
|
|
|
"id": "Bbr6MTQ488GH" |
|
|
|
}, |
|
|
|
"source": [ |
|
|
|
"student_id = 'your_student_id' # fill with your student ID\n", |
|
|
|
"\n", |
|
|
|
"assert student_id != 'your_student_id', 'Please fill in your student_id before you start.'" |
|
|
|
], |
|
|
|
"execution_count": null, |
|
|
|
"outputs": [] |
|
|
|
}, |
|
|
|
{ |
|
|
|
"cell_type": "markdown", |
|
|
|
"metadata": { |
|
|
|
"id": "XHz08Ybg-dmB" |
|
|
|
}, |
|
|
|
"source": [ |
|
|
|
"## Calculate Hessian Matrix\n", |
|
|
|
"The computation of Hessian is done by TA, you don't need to and shouldn't change the following code. The only thing you need to do is to run the following blocks and determine whether the model is at `local minima like`, `saddle point`, or `none of the above` according to the value of `gradient norm` and `minimum ratio`." |
|
|
|
] |
|
|
|
}, |
|
|
|
{ |
|
|
|
"cell_type": "markdown", |
|
|
|
"metadata": { |
|
|
|
"id": "zvDeNevCUvDW" |
|
|
|
}, |
|
|
|
"source": [ |
|
|
|
"### Install Package to Compute Hessian.\n", |
|
|
|
"\n", |
|
|
|
"The autograd-lib library is used to compute Hessian matrix. You can check the full document here https://github.com/cybertronai/autograd-lib." |
|
|
|
] |
|
|
|
}, |
|
|
|
{ |
|
|
|
"cell_type": "code", |
|
|
|
"metadata": { |
|
|
|
"id": "r135K45psHwF", |
|
|
|
"colab": { |
|
|
|
"base_uri": "https://localhost:8080/" |
|
|
|
}, |
|
|
|
"outputId": "8492ab08-8ad9-4525-db9c-35884eaa0641" |
|
|
|
}, |
|
|
|
"source": [ |
|
|
|
"!pip install autograd-lib" |
|
|
|
], |
|
|
|
"execution_count": null, |
|
|
|
"outputs": [ |
|
|
|
{ |
|
|
|
"output_type": "stream", |
|
|
|
"text": [ |
|
|
|
"Requirement already satisfied: autograd-lib in /usr/local/lib/python3.7/dist-packages (0.0.7)\n", |
|
|
|
"Requirement already satisfied: gin-config in /usr/local/lib/python3.7/dist-packages (from autograd-lib) (0.4.0)\n", |
|
|
|
"Requirement already satisfied: seaborn in /usr/local/lib/python3.7/dist-packages (from autograd-lib) (0.11.1)\n", |
|
|
|
"Requirement already satisfied: pytorch-lightning in /usr/local/lib/python3.7/dist-packages (from autograd-lib) (1.2.3)\n", |
|
|
|
"Requirement already satisfied: scipy>=1.0 in /usr/local/lib/python3.7/dist-packages (from seaborn->autograd-lib) (1.4.1)\n", |
|
|
|
"Requirement already satisfied: numpy>=1.15 in /usr/local/lib/python3.7/dist-packages (from seaborn->autograd-lib) (1.19.5)\n", |
|
|
|
"Requirement already satisfied: pandas>=0.23 in /usr/local/lib/python3.7/dist-packages (from seaborn->autograd-lib) (1.1.5)\n", |
|
|
|
"Requirement already satisfied: matplotlib>=2.2 in /usr/local/lib/python3.7/dist-packages (from seaborn->autograd-lib) (3.2.2)\n", |
|
|
|
"Requirement already satisfied: torch>=1.4 in /usr/local/lib/python3.7/dist-packages (from pytorch-lightning->autograd-lib) (1.8.0+cu101)\n", |
|
|
|
"Requirement already satisfied: tensorboard>=2.2.0 in /usr/local/lib/python3.7/dist-packages (from pytorch-lightning->autograd-lib) (2.4.1)\n", |
|
|
|
"Requirement already satisfied: tqdm>=4.41.0 in /usr/local/lib/python3.7/dist-packages (from pytorch-lightning->autograd-lib) (4.41.1)\n", |
|
|
|
"Requirement already satisfied: future>=0.17.1 in /usr/local/lib/python3.7/dist-packages (from pytorch-lightning->autograd-lib) (0.18.2)\n", |
|
|
|
"Requirement already satisfied: fsspec[http]>=0.8.1 in /usr/local/lib/python3.7/dist-packages (from pytorch-lightning->autograd-lib) (0.8.7)\n", |
|
|
|
"Requirement already satisfied: PyYAML!=5.4.*,>=5.1 in /usr/local/lib/python3.7/dist-packages (from pytorch-lightning->autograd-lib) (5.3.1)\n", |
|
|
|
"Requirement already satisfied: python-dateutil>=2.7.3 in /usr/local/lib/python3.7/dist-packages (from pandas>=0.23->seaborn->autograd-lib) (2.8.1)\n", |
|
|
|
"Requirement already satisfied: pytz>=2017.2 in /usr/local/lib/python3.7/dist-packages (from pandas>=0.23->seaborn->autograd-lib) (2018.9)\n", |
|
|
|
"Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.7/dist-packages (from matplotlib>=2.2->seaborn->autograd-lib) (0.10.0)\n", |
|
|
|
"Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib>=2.2->seaborn->autograd-lib) (2.4.7)\n", |
|
|
|
"Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib>=2.2->seaborn->autograd-lib) (1.3.1)\n", |
|
|
|
"Requirement already satisfied: typing-extensions in /usr/local/lib/python3.7/dist-packages (from torch>=1.4->pytorch-lightning->autograd-lib) (3.7.4.3)\n", |
|
|
|
"Requirement already satisfied: google-auth-oauthlib<0.5,>=0.4.1 in /usr/local/lib/python3.7/dist-packages (from tensorboard>=2.2.0->pytorch-lightning->autograd-lib) (0.4.3)\n", |
|
|
|
"Requirement already satisfied: wheel>=0.26; python_version >= \"3\" in /usr/local/lib/python3.7/dist-packages (from tensorboard>=2.2.0->pytorch-lightning->autograd-lib) (0.36.2)\n", |
|
|
|
"Requirement already satisfied: google-auth<2,>=1.6.3 in /usr/local/lib/python3.7/dist-packages (from tensorboard>=2.2.0->pytorch-lightning->autograd-lib) (1.27.1)\n", |
|
|
|
"Requirement already satisfied: six>=1.10.0 in /usr/local/lib/python3.7/dist-packages (from tensorboard>=2.2.0->pytorch-lightning->autograd-lib) (1.15.0)\n", |
|
|
|
"Requirement already satisfied: protobuf>=3.6.0 in /usr/local/lib/python3.7/dist-packages (from tensorboard>=2.2.0->pytorch-lightning->autograd-lib) (3.12.4)\n", |
|
|
|
"Requirement already satisfied: grpcio>=1.24.3 in /usr/local/lib/python3.7/dist-packages (from tensorboard>=2.2.0->pytorch-lightning->autograd-lib) (1.32.0)\n", |
|
|
|
"Requirement already satisfied: werkzeug>=0.11.15 in /usr/local/lib/python3.7/dist-packages (from tensorboard>=2.2.0->pytorch-lightning->autograd-lib) (1.0.1)\n", |
|
|
|
"Requirement already satisfied: absl-py>=0.4 in /usr/local/lib/python3.7/dist-packages (from tensorboard>=2.2.0->pytorch-lightning->autograd-lib) (0.10.0)\n", |
|
|
|
"Requirement already satisfied: requests<3,>=2.21.0 in /usr/local/lib/python3.7/dist-packages (from tensorboard>=2.2.0->pytorch-lightning->autograd-lib) (2.23.0)\n", |
|
|
|
"Requirement already satisfied: markdown>=2.6.8 in /usr/local/lib/python3.7/dist-packages (from tensorboard>=2.2.0->pytorch-lightning->autograd-lib) (3.3.4)\n", |
|
|
|
"Requirement already satisfied: setuptools>=41.0.0 in /usr/local/lib/python3.7/dist-packages (from tensorboard>=2.2.0->pytorch-lightning->autograd-lib) (54.0.0)\n", |
|
|
|
"Requirement already satisfied: tensorboard-plugin-wit>=1.6.0 in /usr/local/lib/python3.7/dist-packages (from tensorboard>=2.2.0->pytorch-lightning->autograd-lib) (1.8.0)\n", |
|
|
|
"Requirement already satisfied: importlib-metadata; python_version < \"3.8\" in /usr/local/lib/python3.7/dist-packages (from fsspec[http]>=0.8.1->pytorch-lightning->autograd-lib) (3.7.0)\n", |
|
|
|
"Requirement already satisfied: aiohttp; extra == \"http\" in /usr/local/lib/python3.7/dist-packages (from fsspec[http]>=0.8.1->pytorch-lightning->autograd-lib) (3.7.4.post0)\n", |
|
|
|
"Requirement already satisfied: requests-oauthlib>=0.7.0 in /usr/local/lib/python3.7/dist-packages (from google-auth-oauthlib<0.5,>=0.4.1->tensorboard>=2.2.0->pytorch-lightning->autograd-lib) (1.3.0)\n", |
|
|
|
"Requirement already satisfied: rsa<5,>=3.1.4; python_version >= \"3.6\" in /usr/local/lib/python3.7/dist-packages (from google-auth<2,>=1.6.3->tensorboard>=2.2.0->pytorch-lightning->autograd-lib) (4.7.2)\n", |
|
|
|
"Requirement already satisfied: cachetools<5.0,>=2.0.0 in /usr/local/lib/python3.7/dist-packages (from google-auth<2,>=1.6.3->tensorboard>=2.2.0->pytorch-lightning->autograd-lib) (4.2.1)\n", |
|
|
|
"Requirement already satisfied: pyasn1-modules>=0.2.1 in /usr/local/lib/python3.7/dist-packages (from google-auth<2,>=1.6.3->tensorboard>=2.2.0->pytorch-lightning->autograd-lib) (0.2.8)\n", |
|
|
|
"Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.7/dist-packages (from requests<3,>=2.21.0->tensorboard>=2.2.0->pytorch-lightning->autograd-lib) (1.24.3)\n", |
|
|
|
"Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.7/dist-packages (from requests<3,>=2.21.0->tensorboard>=2.2.0->pytorch-lightning->autograd-lib) (3.0.4)\n", |
|
|
|
"Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.7/dist-packages (from requests<3,>=2.21.0->tensorboard>=2.2.0->pytorch-lightning->autograd-lib) (2.10)\n", |
|
|
|
"Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.7/dist-packages (from requests<3,>=2.21.0->tensorboard>=2.2.0->pytorch-lightning->autograd-lib) (2020.12.5)\n", |
|
|
|
"Requirement already satisfied: zipp>=0.5 in /usr/local/lib/python3.7/dist-packages (from importlib-metadata; python_version < \"3.8\"->fsspec[http]>=0.8.1->pytorch-lightning->autograd-lib) (3.4.1)\n", |
|
|
|
"Requirement already satisfied: multidict<7.0,>=4.5 in /usr/local/lib/python3.7/dist-packages (from aiohttp; extra == \"http\"->fsspec[http]>=0.8.1->pytorch-lightning->autograd-lib) (5.1.0)\n", |
|
|
|
"Requirement already satisfied: attrs>=17.3.0 in /usr/local/lib/python3.7/dist-packages (from aiohttp; extra == \"http\"->fsspec[http]>=0.8.1->pytorch-lightning->autograd-lib) (20.3.0)\n", |
|
|
|
"Requirement already satisfied: async-timeout<4.0,>=3.0 in /usr/local/lib/python3.7/dist-packages (from aiohttp; extra == \"http\"->fsspec[http]>=0.8.1->pytorch-lightning->autograd-lib) (3.0.1)\n", |
|
|
|
"Requirement already satisfied: yarl<2.0,>=1.0 in /usr/local/lib/python3.7/dist-packages (from aiohttp; extra == \"http\"->fsspec[http]>=0.8.1->pytorch-lightning->autograd-lib) (1.6.3)\n", |
|
|
|
"Requirement already satisfied: oauthlib>=3.0.0 in /usr/local/lib/python3.7/dist-packages (from requests-oauthlib>=0.7.0->google-auth-oauthlib<0.5,>=0.4.1->tensorboard>=2.2.0->pytorch-lightning->autograd-lib) (3.1.0)\n", |
|
|
|
"Requirement already satisfied: pyasn1>=0.1.3 in /usr/local/lib/python3.7/dist-packages (from rsa<5,>=3.1.4; python_version >= \"3.6\"->google-auth<2,>=1.6.3->tensorboard>=2.2.0->pytorch-lightning->autograd-lib) (0.4.8)\n" |
|
|
|
], |
|
|
|
"name": "stdout" |
|
|
|
} |
|
|
|
] |
|
|
|
}, |
|
|
|
{ |
|
|
|
"cell_type": "markdown", |
|
|
|
"metadata": { |
|
|
|
"id": "ZFGBCIFmVLS_" |
|
|
|
}, |
|
|
|
"source": [ |
|
|
|
"### Import Libraries\r\n" |
|
|
|
] |
|
|
|
}, |
|
|
|
{ |
|
|
|
"cell_type": "code", |
|
|
|
"metadata": { |
|
|
|
"id": "_-vjBvH0uqA-" |
|
|
|
}, |
|
|
|
"source": [ |
|
|
|
"import numpy as np\r\n", |
|
|
|
"from math import pi\r\n", |
|
|
|
"from collections import defaultdict\r\n", |
|
|
|
"from autograd_lib import autograd_lib\r\n", |
|
|
|
"\r\n", |
|
|
|
"import torch\r\n", |
|
|
|
"import torch.nn as nn\r\n", |
|
|
|
"from torch.utils.data import DataLoader, Dataset\r\n", |
|
|
|
"\r\n", |
|
|
|
"import warnings\r\n", |
|
|
|
"warnings.filterwarnings(\"ignore\")" |
|
|
|
], |
|
|
|
"execution_count": null, |
|
|
|
"outputs": [] |
|
|
|
}, |
|
|
|
{ |
|
|
|
"cell_type": "markdown", |
|
|
|
"metadata": { |
|
|
|
"id": "ubbsl4dUVUj6" |
|
|
|
}, |
|
|
|
"source": [ |
|
|
|
"### Define NN Model\n", |
|
|
|
"The NN model here is used to fit a single variable math function.\n", |
|
|
|
"$$f(x) = \\frac{\\sin(5\\pi x)}{5\\pi x}.$$" |
|
|
|
] |
|
|
|
}, |
|
|
|
{ |
|
|
|
"cell_type": "code", |
|
|
|
"metadata": { |
|
|
|
"id": "uvdOpR9lVaJQ" |
|
|
|
}, |
|
|
|
"source": [ |
|
|
|
"class MathRegressor(nn.Module):\r\n", |
|
|
|
" def __init__(self, num_hidden=128):\r\n", |
|
|
|
" super().__init__()\r\n", |
|
|
|
" self.regressor = nn.Sequential(\r\n", |
|
|
|
" nn.Linear(1, num_hidden),\r\n", |
|
|
|
" nn.ReLU(),\r\n", |
|
|
|
" nn.Linear(num_hidden, 1)\r\n", |
|
|
|
" )\r\n", |
|
|
|
"\r\n", |
|
|
|
" def forward(self, x):\r\n", |
|
|
|
" x = self.regressor(x)\r\n", |
|
|
|
" return x" |
|
|
|
], |
|
|
|
"execution_count": null, |
|
|
|
"outputs": [] |
|
|
|
}, |
|
|
|
{ |
|
|
|
"cell_type": "markdown", |
|
|
|
"metadata": { |
|
|
|
"id": "3nO0POKbWU9o" |
|
|
|
}, |
|
|
|
"source": [ |
|
|
|
"### Get Pretrained Checkpoints\n", |
|
|
|
"The pretrained checkpoints is done by TA. Each student will get a different checkpoint." |
|
|
|
] |
|
|
|
}, |
|
|
|
{ |
|
|
|
"cell_type": "code", |
|
|
|
"metadata": { |
|
|
|
"id": "rUG_tQKLbIKB", |
|
|
|
"colab": { |
|
|
|
"base_uri": "https://localhost:8080/" |
|
|
|
}, |
|
|
|
"outputId": "46ab3391-f669-45cf-b3ae-a3a6ad901f4a" |
|
|
|
}, |
|
|
|
"source": [ |
|
|
|
"!gdown --id 1ym6G7KKNkbsqSnMmnxdQKHO1JBoF0LPR" |
|
|
|
], |
|
|
|
"execution_count": null, |
|
|
|
"outputs": [ |
|
|
|
{ |
|
|
|
"output_type": "stream", |
|
|
|
"text": [ |
|
|
|
"Downloading...\n", |
|
|
|
"From: https://drive.google.com/uc?id=1ym6G7KKNkbsqSnMmnxdQKHO1JBoF0LPR\n", |
|
|
|
"To: /content/data.pth\n", |
|
|
|
"\r 0% 0.00/34.5k [00:00<?, ?B/s]\r100% 34.5k/34.5k [00:00<00:00, 58.8MB/s]\n" |
|
|
|
], |
|
|
|
"name": "stdout" |
|
|
|
} |
|
|
|
] |
|
|
|
}, |
|
|
|
{ |
|
|
|
"cell_type": "markdown", |
|
|
|
"metadata": { |
|
|
|
"id": "kOFibHKCek_A" |
|
|
|
}, |
|
|
|
"source": [ |
|
|
|
"### Load Pretrained Checkpoints and Training Data" |
|
|
|
] |
|
|
|
}, |
|
|
|
{ |
|
|
|
"cell_type": "code", |
|
|
|
"metadata": { |
|
|
|
"id": "zkLZoCR51D7P" |
|
|
|
}, |
|
|
|
"source": [ |
|
|
|
"# find the key from student_id\n", |
|
|
|
"import re\n", |
|
|
|
"\n", |
|
|
|
"key = student_id[-1]\n", |
|
|
|
"if re.match('[0-9]', key) is not None:\n", |
|
|
|
" key = int(key)\n", |
|
|
|
"else:\n", |
|
|
|
" key = ord(key) % 10" |
|
|
|
], |
|
|
|
"execution_count": null, |
|
|
|
"outputs": [] |
|
|
|
}, |
|
|
|
{ |
|
|
|
"cell_type": "code", |
|
|
|
"metadata": { |
|
|
|
"id": "OSU8vnXEbY6q" |
|
|
|
}, |
|
|
|
"source": [ |
|
|
|
"# load checkpoint and data corresponding to the key\r\n", |
|
|
|
"model = MathRegressor()\r\n", |
|
|
|
"autograd_lib.register(model)\r\n", |
|
|
|
"\r\n", |
|
|
|
"data = torch.load('data.pth')[key]\r\n", |
|
|
|
"model.load_state_dict(data['model'])\r\n", |
|
|
|
"train, target = data['data']" |
|
|
|
], |
|
|
|
"execution_count": null, |
|
|
|
"outputs": [] |
|
|
|
}, |
|
|
|
{ |
|
|
|
"cell_type": "markdown", |
|
|
|
"metadata": { |
|
|
|
"id": "EyBX5Gvvm_IW" |
|
|
|
}, |
|
|
|
"source": [ |
|
|
|
"### Function to compute gradient norm" |
|
|
|
] |
|
|
|
}, |
|
|
|
{ |
|
|
|
"cell_type": "code", |
|
|
|
"metadata": { |
|
|
|
"id": "2i8qGj2dnYBN" |
|
|
|
}, |
|
|
|
"source": [ |
|
|
|
"# function to compute gradient norm\n", |
|
|
|
"def compute_gradient_norm(model, criterion, train, target):\n", |
|
|
|
" model.train()\n", |
|
|
|
" model.zero_grad()\n", |
|
|
|
" output = model(train)\n", |
|
|
|
" loss = criterion(output, target)\n", |
|
|
|
" loss.backward()\n", |
|
|
|
"\n", |
|
|
|
" grads = []\n", |
|
|
|
" for p in model.regressor.children():\n", |
|
|
|
" if isinstance(p, nn.Linear):\n", |
|
|
|
" param_norm = p.weight.grad.norm(2).item()\n", |
|
|
|
" grads.append(param_norm)\n", |
|
|
|
"\n", |
|
|
|
" grad_mean = np.mean(grads) # compute mean of gradient norms\n", |
|
|
|
"\n", |
|
|
|
" return grad_mean" |
|
|
|
], |
|
|
|
"execution_count": null, |
|
|
|
"outputs": [] |
|
|
|
}, |
|
|
|
{ |
|
|
|
"cell_type": "markdown", |
|
|
|
"metadata": { |
|
|
|
"id": "BSHRU6saoOnf" |
|
|
|
}, |
|
|
|
"source": [ |
|
|
|
"### Function to compute minimum ratio" |
|
|
|
] |
|
|
|
}, |
|
|
|
{ |
|
|
|
"cell_type": "code", |
|
|
|
"metadata": { |
|
|
|
"id": "zizIq6Y_o_UK" |
|
|
|
}, |
|
|
|
"source": [ |
|
|
|
"# source code from the official document https://github.com/cybertronai/autograd-lib\n", |
|
|
|
"\n", |
|
|
|
"# helper function to save activations\n", |
|
|
|
"def save_activations(layer, A, _):\n", |
|
|
|
" '''\n", |
|
|
|
" A is the input of the layer, we use batch size of 6 here\n", |
|
|
|
" layer 1: A has size of (6, 1)\n", |
|
|
|
" layer 2: A has size of (6, 128)\n", |
|
|
|
" '''\n", |
|
|
|
" activations[layer] = A\n", |
|
|
|
"\n", |
|
|
|
"# helper function to compute Hessian matrix\n", |
|
|
|
"def compute_hess(layer, _, B):\n", |
|
|
|
" '''\n", |
|
|
|
" B is the backprop value of the layer\n", |
|
|
|
" layer 1: B has size of (6, 128)\n", |
|
|
|
" layer 2: B ahs size of (6, 1)\n", |
|
|
|
" '''\n", |
|
|
|
" A = activations[layer]\n", |
|
|
|
" BA = torch.einsum('nl,ni->nli', B, A) # do batch-wise outer product\n", |
|
|
|
"\n", |
|
|
|
" # full Hessian\n", |
|
|
|
" hess[layer] += torch.einsum('nli,nkj->likj', BA, BA) # do batch-wise outer product, then sum over the batch" |
|
|
|
], |
|
|
|
"execution_count": null, |
|
|
|
"outputs": [] |
|
|
|
}, |
|
|
|
{ |
|
|
|
"cell_type": "code", |
|
|
|
"metadata": { |
|
|
|
"id": "l0r4R_-soT58" |
|
|
|
}, |
|
|
|
"source": [ |
|
|
|
"# function to compute the minimum ratio\n", |
|
|
|
"def compute_minimum_ratio(model, criterion, train, target):\n", |
|
|
|
" model.zero_grad()\n", |
|
|
|
" # compute Hessian matrix\n", |
|
|
|
" # save the gradient of each layer\n", |
|
|
|
" with autograd_lib.module_hook(save_activations):\n", |
|
|
|
" output = model(train)\n", |
|
|
|
" loss = criterion(output, target)\n", |
|
|
|
"\n", |
|
|
|
" # compute Hessian according to the gradient value stored in the previous step\n", |
|
|
|
" with autograd_lib.module_hook(compute_hess):\n", |
|
|
|
" autograd_lib.backward_hessian(output, loss='LeastSquares')\n", |
|
|
|
"\n", |
|
|
|
" layer_hess = list(hess.values())\n", |
|
|
|
" minimum_ratio = []\n", |
|
|
|
"\n", |
|
|
|
" # compute eigenvalues of the Hessian matrix\n", |
|
|
|
" for h in layer_hess:\n", |
|
|
|
" size = h.shape[0] * h.shape[1]\n", |
|
|
|
" h = h.reshape(size, size)\n", |
|
|
|
" h_eig = torch.symeig(h).eigenvalues # torch.symeig() returns eigenvalues and eigenvectors of a real symmetric matrix\n", |
|
|
|
" num_greater = torch.sum(h_eig > 0).item()\n", |
|
|
|
" minimum_ratio.append(num_greater / len(h_eig))\n", |
|
|
|
"\n", |
|
|
|
" ratio_mean = np.mean(minimum_ratio) # compute mean of minimum ratio\n", |
|
|
|
"\n", |
|
|
|
" return ratio_mean" |
|
|
|
], |
|
|
|
"execution_count": null, |
|
|
|
"outputs": [] |
|
|
|
}, |
|
|
|
{ |
|
|
|
"cell_type": "markdown", |
|
|
|
"metadata": { |
|
|
|
"id": "ABZhFwVZY3x3" |
|
|
|
}, |
|
|
|
"source": [ |
|
|
|
"### Mathematical Derivation\n", |
|
|
|
"\n", |
|
|
|
"Method used here: https://en.wikipedia.org/wiki/Gauss–Newton_algorithm\n", |
|
|
|
"\n", |
|
|
|
"> **Notations** \\\\\n", |
|
|
|
"> $\\mathbf{A}$: the input of the layer. \\\\\n", |
|
|
|
"> $\\mathbf{B}$: the backprop value. \\\\\n", |
|
|
|
"> $\\mathbf{Z}$: the output of the layer. \\\\\n", |
|
|
|
"> $L$: the total loss, mean squared error was used here, $L=e^2$. \\\\\n", |
|
|
|
"> $w$: the weight value.\n", |
|
|
|
"\n", |
|
|
|
"Assume that the input dimension of the layer is $n$, and the output dimension of the layer is $m$.\n", |
|
|
|
"\n", |
|
|
|
"The derivative of the loss is\n", |
|
|
|
"\n", |
|
|
|
"\\begin{align*}\n", |
|
|
|
" \\left(\\frac{\\partial L}{\\partial w}\\right)_{nm} &= \\mathbf{A}_m \\mathbf{B}_n,\n", |
|
|
|
"\\end{align*}\n", |
|
|
|
"\n", |
|
|
|
"which can be written as\n", |
|
|
|
"\n", |
|
|
|
"\\begin{align*}\n", |
|
|
|
" \\frac{\\partial L}{\\partial w} &= \\mathbf{B} \\times \\mathbf{A}.\n", |
|
|
|
"\\end{align*}\n", |
|
|
|
"\n", |
|
|
|
"The Hessian can be derived as\n", |
|
|
|
"\n", |
|
|
|
"\\begin{align*}\n", |
|
|
|
" \\mathbf{H}_{ij}&=\\frac{\\partial^2 L}{\\partial w_i \\partial w_j} \\\\\n", |
|
|
|
" &= \\frac{\\partial}{\\partial w_i}\\left(\\frac{\\partial L}{\\partial w_j}\\right) \\\\\n", |
|
|
|
" &= \\frac{\\partial}{\\partial w_i}\\left(\\frac{2e\\partial e}{\\partial w_j}\\right) \\\\\n", |
|
|
|
" &= 2\\frac{\\partial e}{\\partial w_i}\\frac{\\partial e}{\\partial w_j}+2e\\frac{\\partial^2 e}{\\partial w_j \\partial w_i}.\n", |
|
|
|
"\\end{align*}\n", |
|
|
|
"\n", |
|
|
|
"We neglect the second-order derivative term because the term is relatively small ($e$ is small)\n", |
|
|
|
"\n", |
|
|
|
"\\begin{align*}\n", |
|
|
|
" \\mathbf{H}_{ij}\n", |
|
|
|
" &\\propto \\frac{\\partial e}{\\partial w_i}\\frac{\\partial e}{\\partial w_j},\n", |
|
|
|
"\\end{align*}\n", |
|
|
|
"\n", |
|
|
|
"and as the error $e$ is a constant\n", |
|
|
|
"\n", |
|
|
|
"\\begin{align*}\n", |
|
|
|
" \\mathbf{H}_{ij}\n", |
|
|
|
" &\\propto \\frac{\\partial L}{\\partial w_i}\\frac{\\partial L}{\\partial w_j},\n", |
|
|
|
"\\end{align*}\n", |
|
|
|
"\n", |
|
|
|
"then the full Hessian becomes\n", |
|
|
|
"\n", |
|
|
|
"\\begin{align*}\n", |
|
|
|
" \\mathbf{H} &\\propto (\\mathbf{B}\\times\\mathbf{A})\\times(\\mathbf{B}\\times\\mathbf{A}).\n", |
|
|
|
"\\end{align*}\n" |
|
|
|
] |
|
|
|
}, |
|
|
|
{ |
|
|
|
"cell_type": "code", |
|
|
|
"metadata": { |
|
|
|
"id": "1X-2uxwTcB9u" |
|
|
|
}, |
|
|
|
"source": [ |
|
|
|
"# the main function to compute gradient norm and minimum ratio\r\n", |
|
|
|
"def main(model, train, target):\r\n", |
|
|
|
" criterion = nn.MSELoss()\r\n", |
|
|
|
"\r\n", |
|
|
|
" gradient_norm = compute_gradient_norm(model, criterion, train, target)\r\n", |
|
|
|
" minimum_ratio = compute_minimum_ratio(model, criterion, train, target)\r\n", |
|
|
|
"\r\n", |
|
|
|
" print('gradient norm: {}, minimum ratio: {}'.format(gradient_norm, minimum_ratio))" |
|
|
|
], |
|
|
|
"execution_count": null, |
|
|
|
"outputs": [] |
|
|
|
}, |
|
|
|
{ |
|
|
|
"cell_type": "markdown", |
|
|
|
"metadata": { |
|
|
|
"id": "uwHyQHc9w8k1" |
|
|
|
}, |
|
|
|
"source": [ |
|
|
|
"After running this block, you will get the value of `gradient norm` and `minimum ratio`. Determine whether the model is at `local minima like`, `saddle point`, or `none of the above`, and then submit your choice to NTU COOL." |
|
|
|
] |
|
|
|
}, |
|
|
|
{ |
|
|
|
"cell_type": "code", |
|
|
|
"metadata": { |
|
|
|
"id": "877W_ShIzS7a", |
|
|
|
"colab": { |
|
|
|
"base_uri": "https://localhost:8080/" |
|
|
|
}, |
|
|
|
"outputId": "6c90fdd9-0bbd-405c-c781-457d265c1606" |
|
|
|
}, |
|
|
|
"source": [ |
|
|
|
"if __name__ == '__main__':\n", |
|
|
|
" # fix random seed\n", |
|
|
|
" torch.manual_seed(0)\n", |
|
|
|
"\n", |
|
|
|
" # reset compute dictionaries\n", |
|
|
|
" activations = defaultdict(int)\n", |
|
|
|
" hess = defaultdict(float)\n", |
|
|
|
"\n", |
|
|
|
" # compute Hessian\n", |
|
|
|
" main(model, train, target)" |
|
|
|
], |
|
|
|
"execution_count": null, |
|
|
|
"outputs": [ |
|
|
|
{ |
|
|
|
"output_type": "stream", |
|
|
|
"text": [ |
|
|
|
"gradient norm: 0.07222428917884827, minimum ratio: 0.46484375\n" |
|
|
|
], |
|
|
|
"name": "stdout" |
|
|
|
} |
|
|
|
] |
|
|
|
} |
|
|
|
] |
|
|
|
} |