Sunteți pe pagina 1din 10

Replicação Artigo file:///C:/Users/EMANUEL/AppData/Local/Temp/Replicação Artigo-1...

Aprendizado de Máquina
Professor: Tiago Buarque

Aluno: Emanuel Ferreira

Projeto de Replicação de Artigo Científico - 1ª V.A.

Artigo: Distribution preserving learning for unsupervised feature selection


Autores: Ting Xie, Pengfei Ren, Taiping Zhang, Yuan Yan Tang

1. Importações e Constantes
In [1]: #bibliotecas
from IPython.display import Image
import numpy as np
import pandas as pd
import math

In [2]: #constantes e parametros


DELTA = 0.5 #valor usado para verificar a convergenc
ia de theta
epsilon = 0.1 #taxa de aprendizagem para a regra de at
ualizacao
folder = "C:\\Users\\USUARIO\\Desktop\\AM\\"

2. Implementação do Algoritmo DPFS


In [203]: Image(folder+"imgs\\01.png")

Out[203]:

In [204]: def kernelGaussian(u):


a = 1. / math.sqrt(2*math.pi)
exp = -0.5 * (u ** 2)
#b = math.pow(math.e, exp)
b = np.power(math.e, exp)
return a * b

1 of 10 12/07/2018 10:07
Replicação Artigo file:///C:/Users/EMANUEL/AppData/Local/Temp/Replicação Artigo-1...

In [205]: def distanciaEuclidiana(x1, x2):


somatorio = 0
for j in range(len(x1)):
somatorio += (x1[j] - x2[j])**2
return math.sqrt(somatorio)

In [206]: def kernelDensity(X, x, h):


n = len(X)
somatorio = 0
for i in range(n):
somatorio += kernelGaussian(distanciaEuclidiana(x, X[i]))
return (1./(n*h)) * somatorio

In [207]: Image(folder+"imgs\\02.png")

Out[207]:

In [208]: def distanciaEuclidianaChapeu(x1, x2, theta):


somatorio = 0
for j in range(len(x1)):
somatorio += (x1[j] - (x2[j]* (theta[j]**2) ) )**2
return math.sqrt(somatorio)

In [209]: def kernelDensityChapeu(X, x, h, alfa):


n = len(X)
somatorio = 0
for i in range(n):
somatorio += kernelGaussian(distanciaEuclidianaChapeu(x, X[i], alfa))
return (1./(n*h)) * somatorio

In [210]: Image(folder+"imgs\\07.png")

Out[210]:

In [211]: Image(folder+"imgs\\03.png")

Out[211]:

In [212]: def updateRule(x, y, theta, lambd):


novo_theta = derivadasGradiente(x, y, theta, lambd)
#novo_theta = None
_theta = theta - epsilon * novo_theta

return _theta

2 of 10 12/07/2018 10:07
Replicação Artigo file:///C:/Users/EMANUEL/AppData/Local/Temp/Replicação Artigo-1...

In [213]: def derivadasGradiente(x, y, theta, lambd, ):

theta_calculado = []

a = (2 * kernelGaussian(distanciaEuclidiana(x,y)))/math.sqrt(2 * math.pi)

for j in range(len(x)):

b = x[j]
c = y[j]

#derivada 1
d1_1 = 2 * c
d1_22 = -(math.pow(b,2) + 2 * b * c * math.pow(theta[j],2) - math.pow(c
, 2) * math.pow(theta[j], 4))
d1_2 = math.pow(math.e, d1_22)
d1_3 = theta[j] * (b - (c * math.pow(theta[j], 2)))
d1 = (d1_1 * d1_2 * d1_3) / math.pi

#derivada 2
d2_1 = 2 * a * c

d2_22 = (b - c * (theta[j] ** 2)) ** 2


d2_2 = math.pow(math.e, -(d2_22/2.))

d2_3 = theta[j] * (b - c * (theta[j] ** 2))


d2 = d2_1 * d2_2 * d2_3

#derivada 3
d3 = lambd * 2 * theta[j]

theta_calculado.append(d1 - d2 + d3)

return (np.array(theta_calculado))

In [214]: Image(folder+"imgs\\04.png")

Out[214]:

3 of 10 12/07/2018 10:07
Replicação Artigo file:///C:/Users/EMANUEL/AppData/Local/Temp/Replicação Artigo-1...

In [215]: def convergenceTheta(theta, _theta):


diferencas = []
for i in range(len(theta)):
diferencas.append(abs(theta[i] - _theta[i]))
for d in diferencas:
if d > DELTA:
return False
return True

In [216]: def algorithm(dataset, band, lambd):


theta = np.zeros(len(dataset[0]))
_theta = np.ones(len(dataset[0])) #novo theta gera
do pela atualizacao

kernel = []
#for d in dataset:
# kernel.append(kernelDensity(dataset, d, band))
for i in range(len(dataset)-1):
j = i + 1
kernel.append(kernelGaussian(distanciaEuclidiana(dataset[i], dataset[j]
)))

while(not convergenceTheta(theta, _theta)):


kernel_ = []
#for d in dataset:
# kernel_.append(kernelDensityChapeu(dataset, d, band, theta))
#for i in range(len(dataset)-1):
# j = i + 1
# kernel_.append(kernelGaussian(distanciaEuclidianaChapeu(dataset[i]
, dataset[j], theta)))

#novos_thetas = []
#otimizador aqui
#for i in range(len(dataset)-1):
# j = i + 1
# novos_thetas.append(updateRule(dataset[i], dataset[j], theta[i], l
ambd)) #regra de atualizacao de theta
for i in range(len(dataset)-1):
j = i + 1
novos_thetas = updateRule(dataset[i], dataset[j], theta, lambd)
print(novos_thetas)

_theta = theta
theta = novos_thetas

alfa = theta ** 2
return alfa

In [217]: theta = np.zeros(13)


_theta = np.ones(13)
print(theta)
print(_theta)
convergenceTheta(theta, _theta)

[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

Out[217]: False

3. Implementação do Classificador (KNN) para testar a


qualidade do DPFS

4 of 10 12/07/2018 10:07
Replicação Artigo file:///C:/Users/EMANUEL/AppData/Local/Temp/Replicação Artigo-1...

4. Experimentos com as bases UCI - wine, wdbc, waveform,


twonorm

4.1 Resultados do artigo

4.1.1 Numero de features selecionadas x Acurácia do classificador

Os gráficos a seguir indicam a qualidade da seleção de atributos feita pelo DPFS e seus concorrentes. Na linha horizontal
temos o número de features selecionadas e no eixo vertical temos a acurácia do classificador quando usamos apenas as
features selecionadas do dataset.

In [218]: Image(folder+"imgs\\05.png")

Out[218]:

4.1.2 Features x Pesos

Os gráficos seguintes mostram a saída gerada pelo algoritmo DPFS. Para cada dataset, temos as features no eixo
horizontal e o peso correspondente no eixo vertical. Features com pesos maiores tendem a serem mais representativas
para a distribuição do dados.

5 of 10 12/07/2018 10:07
Replicação Artigo file:///C:/Users/EMANUEL/AppData/Local/Temp/Replicação Artigo-1...

In [219]: Image(folder+"imgs\\06.png")

Out[219]:

4.2 Resultados da replicação

4.2.1 Base Wine

In [220]: data = pd.read_csv(folder+"datasets\\wine.data", header=None)


data.head()

Out[220]:
0 1 2 3 4 5 6 7 8 9 10 11 12 13

0 1 14.23 1.71 2.43 15.6 127 2.80 3.06 0.28 2.29 5.64 1.04 3.92 1065

1 1 13.20 1.78 2.14 11.2 100 2.65 2.76 0.26 1.28 4.38 1.05 3.40 1050

2 1 13.16 2.36 2.67 18.6 101 2.80 3.24 0.30 2.81 5.68 1.03 3.17 1185

3 1 14.37 1.95 2.50 16.8 113 3.85 3.49 0.24 2.18 7.80 0.86 3.45 1480

4 1 13.24 2.59 2.87 21.0 118 2.80 2.69 0.39 1.82 4.32 1.04 2.93 735

In [221]: x = data.iloc[:,1:].values #13 atributos


y = data.iloc[:,0:1].values #classe (total = 3)

6 of 10 12/07/2018 10:07
Replicação Artigo file:///C:/Users/EMANUEL/AppData/Local/Temp/Replicação Artigo-1...

In [222]: #parametros bandwidth e lambda


H = math.log(2,25)
L = 1e-05

In [223]: print("Bandwidth: "+str(H))


print("Lambda: "+str(L))

Bandwidth: 0.21533827903669653
Lambda: 1e-05

7 of 10 12/07/2018 10:07
Replicação Artigo file:///C:/Users/EMANUEL/AppData/Local/Temp/Replicação Artigo-1...

In [224]: algorithm(x, H, L)

8 of 10 12/07/2018 10:07
Replicação Artigo file:///C:/Users/EMANUEL/AppData/Local/Temp/Replicação Artigo-1...

[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]

9 of 10 12/07/2018 10:07
Replicação Artigo file:///C:/Users/EMANUEL/AppData/Local/Temp/Replicação Artigo-1...

Out[224]: array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

4.2.2 Base Wdbc

In [76]: data = pd.read_csv(folder+"datasets\\wdbc.data", header=None)


data.head()

Out[76]:
0 1 2 3 4 5 6 7 8 9 ... 22 23

0 842302 M 17.99 10.38 122.80 1001.0 0.11840 0.27760 0.3001 0.14710 ... 25.38 17.33 184.60

1 842517 M 20.57 17.77 132.90 1326.0 0.08474 0.07864 0.0869 0.07017 ... 24.99 23.41 158.80

2 84300903 M 19.69 21.25 130.00 1203.0 0.10960 0.15990 0.1974 0.12790 ... 23.57 25.53 152.50

3 84348301 M 11.42 20.38 77.58 386.1 0.14250 0.28390 0.2414 0.10520 ... 14.91 26.50 98.87

4 84358402 M 20.29 14.34 135.10 1297.0 0.10030 0.13280 0.1980 0.10430 ... 22.54 16.67 152.20

5 rows × 32 columns

In [77]: x = data.iloc[:,2:].values
y = data.iloc[:,0:1].values

In [78]: algorithm(x, 0.2, 1e-05)

Out[78]: array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

4.2.3 Base Waveform

4.2.4 Base Two Norm

10 of 10 12/07/2018 10:07

S-ar putea să vă placă și