diff --git a/notes/image/20180106_092119.png b/notes/image/20180106_092119.png
index 9020f2a..40f84b6 100644
Binary files a/notes/image/20180106_092119.png and b/notes/image/20180106_092119.png differ
diff --git a/notes/image/20180107_234509.png b/notes/image/20180107_234509.png
new file mode 100644
index 0000000..7ca0f4d
Binary files /dev/null and b/notes/image/20180107_234509.png differ
diff --git a/notes/week1.md b/notes/week1.md
index 93d3bd6..5ecc0e6 100644
--- a/notes/week1.md
+++ b/notes/week1.md
@@ -71,7 +71,7 @@
 
    在房屋价格预测的例子中，给出了一系列的房屋面基数据，根据这些数据来预测任意面积的房屋价格。给出照片-年龄数据集，预测给定照片的年龄。
 
-   ![](image\20180105_194712.png)
+   ![](image/20180105_194712.png)
 
 2. 分类问题(Classification)
 
@@ -81,7 +81,7 @@
 
    视频中举了癌症肿瘤这个例子，针对诊断结果，分别分类为良性或恶性。还例如垃圾邮件分类问题，也同样属于监督学习中的分类问题。
 
-   ![](image\20180105_194839.png)
+   ![](image/20180105_194839.png)
 
 视频中提到**支持向量机**这个算法，旨在解决当特征量很大的时候(特征即如癌症例子中的肿块大小，颜色，气味等各种特征)，计算机内存一定会不够用的情况。**支持向量机能让计算机处理无限多个特征。**
 
@@ -165,7 +165,7 @@ $h_\theta(x)=\theta_0+\theta_1x$，为其中一种可行的表达式。
 >
 > $\left(x, y\right)$: 训练集中的实例
 >
-> $\left(x^\left(i\right), y^\left(i\right)\right)$: 训练集中的第 $i$ 个样本实例
+> $\left(x^\left(i\right),y^\left(i\right)\right)$: 训练集中的第 $i$ 个样本实例
 
 ![](image/20180105_224648.png)
 
@@ -173,9 +173,9 @@ $h_\theta(x)=\theta_0+\theta_1x$，为其中一种可行的表达式。
 
 为了求解最小值，引入损失函数(Cost Function)概念，用于度量建模误差。考虑到要计算最小值，应用二次函数对求和式建模，即应用统计学中的平方损失函数（最小二乘法）：
 
-$$J(\theta_0, \theta_1) = \dfrac {1}{2m} \displaystyle \sum _{i=1}^m \left ( \hat{y}_{i}- y_{i} \right)^2 = \dfrac {1}{2m} \displaystyle \sum _{i=1}^m \left (h_\theta (x_{i}) - y_{i} \right)^2$$ 
+$$J(\theta_0,\theta_1)=\dfrac{1}{2m}\displaystyle\sum_{i=1}^m\left(\hat{y}_{i}-y_{i} \right)^2=\dfrac{1}{2m}\displaystyle\sum_{i=1}^m\left(h_\theta(x_{i})-y_{i}\right)^2$$ 
 
-> 系数 $\frac{1}{2}$ 存在与否都不会影响结果，这里是为了在应用梯度下降时便于求解，平方的导数会抵消掉 $\frac{1}{2}$ 。
+> 系数 $\frac{1}{2}​$ 存在与否都不会影响结果，这里是为了在应用梯度下降时便于求解，平方的导数会抵消掉 $\frac{1}{2}​$ 。
 
 讨论到这里，我们的问题就转化成了**求解 $J\left( \theta_0, \theta_1  \right)$ 的最小值**。
 
@@ -202,7 +202,7 @@ $$J(\theta_0, \theta_1) = \dfrac {1}{2m} \displaystyle \sum _{i=1}^m \left ( \ha
 
 给定数据集：
 
-![](image\20180106_091307.png)
+![](image/20180106_091307.png)
 
 参数在 $\theta_0$ 不恒为 $0$ 时损失函数 $J\left(\theta\right)$ 关于 $\theta_0, \theta_1$ 的3-D图像，图像中的高度为损失函数的值。
 
diff --git a/notes/week2.md b/notes/week2.md
index 0d2e23d..42692f4 100644
--- a/notes/week2.md
+++ b/notes/week2.md
@@ -1,11 +1,39 @@
 [TOC]
 
-# 4 Linear Regression with Multiple Variables
+# 4 多变量线性回归(Linear Regression with Multiple Variables)
 
-## 4.1 Multiple Features
+## 4.1 多特征(Multiple Features)
+
+对于一个要度量的对象，一般来说会有不同维度的多个特征。比如之前的房屋价格预测例子中，除了房屋的面积大小，可能还有房屋的年限、房屋的层数等等其他特征：
+
+![](image/20180107_234509.png)
+
+这里由于特征不再只有一个，引入一些新的记号
+
+> $n$: 特征的总数 
+>
+>  ${x}^{\left( i \right)}$: 代表特征矩阵中第 $i$ 行，也就是第 $i$ 个训练实例。
+>
+>  ${x}_{j}^{\left( i \right)}$: 代表特征矩阵中第 $i$ 行的第 $j$ 个特征，也就是第 $i$ 个训练实例的第 $j$ 个特征。
+
+参照上图，则记号的举例有，${x}^{(2)}\text{=}\begin{bmatrix} 1416\\\ 3\\\ 2\\\ 40 \end{bmatrix}, {x}^{(2)}_{1} = 1416$
+
+多变量假设函数 $h$ 表示为：$h_{\theta}\left( x \right)={\theta_{0}}+{\theta_{1}}{x_{1}}+{\theta_{2}}{x_{2}}+...+{\theta_{n}}{x_{n}}$
+
+对于 $\theta_0$，和单特征中一样，我们将其看作基础数值。例如，房价的基础价格。
+
+参数向量的维度为 $n+1$，在特征向量中添加 $x_{0}$ 后，其维度也变为 $n+1$， 则运用线性代数，可对 $h$ 简化。 
+
+$h_\theta\left(x\right)=\begin{bmatrix}\theta_0\; \theta_1\; ... \;\theta_n \end{bmatrix}\begin{bmatrix}x_0 \newline x_1 \newline \vdots \newline x_n\end{bmatrix}= \theta^T x$
+
+> $\theta^T$: $\theta$ 矩阵的转置
+>
+> $x_0$: 为了计算方便我们会假设 $x_0^{(i)} = 1$
 
 ## 4.2 Gradient Descent for Multiple Variables
 
+
+
 ## 4.3 Gradient Descent in Practice I - Feature Scaling
 
 ## 4.4 Gradient Descent in Practice II - Learning Rate
@@ -19,6 +47,9 @@
 ## 4.8 Working on and Submitting Programming Assignments
 
 # 5 Octave Matlab Tutorial
+
+复习时可直接倍速回顾视频，笔记整理暂留。
+
 ## 5.1 Basic Operations
 
 ## 5.2 Moving Data Around
@@ -29,4 +60,6 @@
 
 ## 5.5 Control Statements_ for, while, if statement
 
-## 5.6 Vectorization
\ No newline at end of file
+## 5.6 Vectorization
+
+## 5.x 常用函数整理
\ No newline at end of file