第2章Python机器学习软件包

# Ipython

# ipython中的快捷键

快捷键	说明
`Ctrl+A`	移动光标到最前面
`Ctrl+E`	移动光标到最末尾
`Ctrl+U`	删除光标前的内容
`Ctrl+K`	删除光标后的内容
`Ctrl+L`	消除显示内容
`Crrl+C`	中断当前的脚本运行

# 查询命令空间中所有的函数和对象

In [2]: np.random.rand*?                                                        
np.random.rand
np.random.randint
np.random.randn
np.random.random
np.random.random_integers
np.random.random_sample

1
2
3
4
5
6
7

# 查询命令：`??`

  In [4]: np.random.rand?? 
  Docstring:
  rand(d0, d1, ..., dn)

  Random values in a given shape.

  Create an array of the given shape and populate it with
  random samples from a uniform distribution
  over ``[0, 1)``.

  Parameters
  ----------
  d0, d1, ..., dn : int, optional
      The dimensions of the returned array, should all be positive.
      If no argument is given a single Python float is returned.

  Returns
  -------
  out : ndarray, shape ``(d0, d1, ..., dn)``
      Random values.

  See Also
  --------
  random

  Notes
  -----
  This is a convenience function. If you want an interface that
  takes a shape-tuple as the first argument, refer to
  np.random.random_sample .

  Examples
  --------
  >>> np.random.rand(3,2)
  array([[ 0.14022471,  0.96360618],  #random
         [ 0.37601032,  0.25528411],  #random
         [ 0.49313049,  0.94909878]]) #random
  Type:      builtin_function_or_method

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38

# 魔术命令

查看变量%who``%whos

In [7]: whos                                                                    
Variable   Type       Data/Info
-------------------------------
a          ndarray    100x100: 10000 elems, type `float64`, 80000 bytes
msg        str        hello ipython
np         module     <module 'numpy' from '/ho<...>kages/numpy/__init__.py'>

In [8]: who                                                                     
a	 msg	 np

1
2
3
4
5
6
7
8
9

清空变量和库 %reset
命令行运行%run hello.py
记录log文件
- 开始记录log：%logstart
- 停止记录log：%logstop

直接运行shell命令：！

In [21]: !ifconfig |grep "inet "                                                
          inet 地址:192.168.6.135  广播:192.168.6.255  掩码:255.255.255.0
          inet 地址:127.0.0.1  掩码:255.0.0.0

1
2
3

重载函数：from imp import reload

# 原先的python函数,hello.py
def say_hello():
  print("ipython is a bad tool")
  
# 调用查看:
# import hello 
# hello.say_hello()

# 后修改为：
def say_hello():
  print("ipython is a bad tool")

# 在此调用查看：
# from imp import reload
# reload(hello)
# hello.say_hello()

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16

# Jupyter-notebook教程

开启notebook：ipython notebook
快捷键
- 切换命令模式和编辑模式：Ctrl+M
  - 运行当前Cell，并停留在当前Cell：Ctrl+Enter
  - 运行当前Cell，并停留在下一Cell：Shift+Enter
  - 在命令模式下：删除当前Cell：DD
  - 转换为Markdown：M
  - 转换为代码模式：Y
  - 在当前cell上面插入代码块：A
  - 在当前cell下面插入代码块：B
- 编辑模式:Enter
  - 快速删除一行：ctrl+D

# Numpy 基础使用

Numpy默认将计算结果转换为行向量

Numpy提供基本的统计功能，求和、求平均值、求方差

一维数组

In [5]: a = np.random.randint(1,5,6)  

In [6]: a.sum()                                                                 
Out[6]: 21

In [7]: a.mean()                                                                
Out[7]: 3.5

In [8]: a.std()                                                                 
Out[8]: 0.7637626158259734

In [9]: a.min()                                                                 
Out[9]: 2

In [10]: a.max()                                                                
Out[10]: 4

In [11]: a.argmin()                                                             
Out[11]: 0

In [13]: a.argmax()                                                             
Out[13]: 1

In [14]: print(a)                                                               
[2 4 4 4 3 4]

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25

二维数组或高维数组，通过axis=0按行计算，axis=1按列计算

消除哪儿一维就选择对应的axis的序号

In [15]: b = np.random.randint(1,100,(6,4))                                     

In [16]: print(b)                                                               
[[55 92 10 75]
 [63 34 61  7]
 [25 29 36 83]
 [62 44  6 12]
 [ 5 37 25 13]
 [36 93 91 14]]

In [17]: b.sum()                                                                
Out[17]: 1008

In [18]: b.sum(axis=0).sum()                                                    
Out[18]: 1008

In [19]: b.sum(axis=1)                                                          
Out[19]: array([232, 165, 173, 124,  80, 234])

In [20]: b.std(axis=1)                                                          
Out[20]: 
array([30.65126425, 22.85142228, 23.28492001, 23.        , 12.12435565,
       34.39840113])

In [21]: b.min(axis=1)                                                          
Out[21]: array([10,  7, 25,  6,  5, 14])

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26

添加一个维度np.newaxis

In [1]: a = np.arange(4)                                                                                                        
In [3]: a = np.arange(4)                                                        

In [4]: a.shape                                                                 
Out[4]: (4,)

In [5]: b = a[:,np.newaxis]                                                     

In [6]: b.shape                                                                 
Out[6]: (4, 1)

In [7]: c = a[np.newaxis,:]                                                     

In [8]: c.shape                                                                 
Out[8]: (1, 4)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

判断两个矩阵是否相同：().all()

In [10]: a = np.array([1,2,3,4])                                                

In [11]: b = np.array([2,3,4,5])                                                

In [12]: a == b                                                                 
Out[12]: array([False, False, False, False])

In [13]: (a ==b ).all()                                                         
Out[13]: False

1
2
3
4
5
6
7
8
9

numpy的读写

保存为.txt
保存为numpy二进制文件.npy

a = np.arange(15).reshape(3,5)
print(a)
[[ 0  1  2  3  4]
 [ 5  6  7  8  9]
 [10 11 12 13 14]]
np.savetxt("a.txt",a)
np.loadtxt("a.txt")
np.save("a.npy",a)
np.load("a.npy")

1
2
3
4
5
6
7
8
9

查看对应的大小：

ls -lh
-rw-rw-r-- 1 jacky jacky 248 12月 18 17:04 a.npy
-rw-rw-r-- 1 jacky jacky 375 12月 18 17:04 a.txt

1
2
3

Numpy 多项式拟合

根的求解

import numpy as np 
p = np.poly1d([1,-4,3])

# 自变量取0的值
p(0)

# 多项式的根
p.roots
print(p.roots)

# 对应根值的y值是0
p(p.roots)
print(p(p.roots))

# 多项式的阶数
p.order
print(p.order)

# 多项式的系数
p.coeffs
print(p.coeffs)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21

拟合

import numpy as np 
import matplotlib.pyplot as plt 

n_dots = 20 
n_order = 3 

x = np.linspace(0,1,n_dots)

# 真实的函数：y = sqrt(x) 再加上 0.2 的方差
y = np.sqrt(x) + 0.2 *np.random.rand()

# 用三次多项式拟合 y = sqrt(x) 函数
p = np.poly1d(np.polyfit(x,y,n_order))

# 输出三次拟合后的系数
print(p.coeffs)

# 绘制拟合权限
t = np.linspace(0,1,200)
plt.plot(x,y,'ro',t,p(t),'-')

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20

蒙特卡洛方法：计算 $\pi$ 值

import numpy as np 
import matplotlib.pyplot as plt 

n_dots = 100000
x = np.random.rand(n_dots)
y = np.random.rand(n_dots)

distance = np.sqrt(x ** 2 + y ** 2)
in_circle = distance[distance < 1]

Pi = 4 * float(len(in_circle)) / n_dots
print(Pi)
plt.plot(x[distance<1],y[distance<1],'bo',x[distance>=1],y[distance>=1],'ko')