Relatorio

Replicação Artigo file:///C:/Users/EMANUEL/AppData/Local/Temp/Replicação Artigo-1...
Aprendizado de Máquina
Professor: Tiago Buarque
Aluno: Emanuel Ferreira
Projeto de Replicação de Artigo Científico - 1ª V.A.
Artigo: Distribution preserving learning for unsupervised feature selection

Autores: Ting Xie, Pengfei Ren, Taiping Zhang, Yuan Yan Tang
1. Importações e Constantes
In [1]: #bibliotecas
from IPython.display import Image
import numpy as np
import pandas as pd
import math
In [2]: #constantes e parametros

DELTA = 0.5 #valor usado para verificar a convergenc
ia de theta
epsilon = 0.1 #taxa de aprendizagem para a regra de at
ualizacao
folder = "C:\\Users\\USUARIO\\Desktop\\AM\\"
2. Implementação do Algoritmo DPFS

In [203]: Image(folder+"imgs\\01.png")
Out[203]:
In [204]: def kernelGaussian(u):

a = 1. / math.sqrt(2*math.pi)
exp = -0.5 * (u ** 2)
#b = math.pow(math.e, exp)
b = np.power(math.e, exp)
return a * b
1 of 10 12/07/2018 10:07
In [205]: def distanciaEuclidiana(x1, x2):

somatorio = 0
for j in range(len(x1)):
somatorio += (x1[j] - x2[j])**2
return math.sqrt(somatorio)
In [206]: def kernelDensity(X, x, h):

n = len(X)
somatorio = 0
for i in range(n):
somatorio += kernelGaussian(distanciaEuclidiana(x, X[i]))
return (1./(n*h)) * somatorio
Out[207]:
In [208]: def distanciaEuclidianaChapeu(x1, x2, theta):

somatorio = 0
for j in range(len(x1)):
somatorio += (x1[j] - (x2[j]* (theta[j]**2) ) )**2
return math.sqrt(somatorio)
In [209]: def kernelDensityChapeu(X, x, h, alfa):

n = len(X)
somatorio = 0
for i in range(n):
somatorio += kernelGaussian(distanciaEuclidianaChapeu(x, X[i], alfa))
return (1./(n*h)) * somatorio
Out[210]:
Out[211]:
In [212]: def updateRule(x, y, theta, lambd):

novo_theta = derivadasGradiente(x, y, theta, lambd)
#novo_theta = None
_theta = theta - epsilon * novo_theta
return _theta
2 of 10 12/07/2018 10:07
In [213]: def derivadasGradiente(x, y, theta, lambd, ):
theta_calculado = []
a = (2 * kernelGaussian(distanciaEuclidiana(x,y)))/math.sqrt(2 * math.pi)
for j in range(len(x)):
b = x[j]
c = y[j]
#derivada 1
d1_1 = 2 * c
d1_22 = -(math.pow(b,2) + 2 * b * c * math.pow(theta[j],2) - math.pow(c
, 2) * math.pow(theta[j], 4))
d1_2 = math.pow(math.e, d1_22)
d1_3 = theta[j] * (b - (c * math.pow(theta[j], 2)))
d1 = (d1_1 * d1_2 * d1_3) / math.pi
#derivada 2
d2_1 = 2 * a * c
d2_22 = (b - c * (theta[j] ** 2)) ** 2

d2_2 = math.pow(math.e, -(d2_22/2.))
d2_3 = theta[j] * (b - c * (theta[j] ** 2))

d2 = d2_1 * d2_2 * d2_3
#derivada 3
d3 = lambd * 2 * theta[j]
theta_calculado.append(d1 - d2 + d3)
return (np.array(theta_calculado))
Out[214]:
3 of 10 12/07/2018 10:07
In [215]: def convergenceTheta(theta, _theta):

diferencas = []
for i in range(len(theta)):
diferencas.append(abs(theta[i] - _theta[i]))
for d in diferencas:
if d > DELTA:
return False
return True
In [216]: def algorithm(dataset, band, lambd):

theta = np.zeros(len(dataset[0]))
_theta = np.ones(len(dataset[0])) #novo theta gera
do pela atualizacao
kernel = []
#for d in dataset:
# kernel.append(kernelDensity(dataset, d, band))
for i in range(len(dataset)-1):
j = i + 1
kernel.append(kernelGaussian(distanciaEuclidiana(dataset[i], dataset[j]
)))
while(not convergenceTheta(theta, _theta)):

kernel_ = []
#for d in dataset:
# kernel_.append(kernelDensityChapeu(dataset, d, band, theta))
#for i in range(len(dataset)-1):
# j = i + 1
# kernel_.append(kernelGaussian(distanciaEuclidianaChapeu(dataset[i]
, dataset[j], theta)))
#novos_thetas = []
#otimizador aqui
#for i in range(len(dataset)-1):
# j = i + 1
# novos_thetas.append(updateRule(dataset[i], dataset[j], theta[i], l
ambd)) #regra de atualizacao de theta
for i in range(len(dataset)-1):
j = i + 1
novos_thetas = updateRule(dataset[i], dataset[j], theta, lambd)
print(novos_thetas)
_theta = theta
theta = novos_thetas
alfa = theta ** 2
return alfa
In [217]: theta = np.zeros(13)

_theta = np.ones(13)
print(theta)
print(_theta)
convergenceTheta(theta, _theta)
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
Out[217]: False
3. Implementação do Classificador (KNN) para testar a

qualidade do DPFS
4 of 10 12/07/2018 10:07
4. Experimentos com as bases UCI - wine, wdbc, waveform,

twonorm
4.1 Resultados do artigo
4.1.1 Numero de features selecionadas x Acurácia do classificador
Os gráficos a seguir indicam a qualidade da seleção de atributos feita pelo DPFS e seus concorrentes. Na linha horizontal
temos o número de features selecionadas e no eixo vertical temos a acurácia do classificador quando usamos apenas as
features selecionadas do dataset.
Out[218]:
4.1.2 Features x Pesos
Os gráficos seguintes mostram a saída gerada pelo algoritmo DPFS. Para cada dataset, temos as features no eixo
horizontal e o peso correspondente no eixo vertical. Features com pesos maiores tendem a serem mais representativas
para a distribuição do dados.
5 of 10 12/07/2018 10:07
Out[219]:
4.2 Resultados da replicação
4.2.1 Base Wine
In [220]: data = pd.read_csv(folder+"datasets\\wine.data", header=None)

data.head()
Out[220]:
0 1 2 3 4 5 6 7 8 9 10 11 12 13
0 1 14.23 1.71 2.43 15.6 127 2.80 3.06 0.28 2.29 5.64 1.04 3.92 1065
1 1 13.20 1.78 2.14 11.2 100 2.65 2.76 0.26 1.28 4.38 1.05 3.40 1050
2 1 13.16 2.36 2.67 18.6 101 2.80 3.24 0.30 2.81 5.68 1.03 3.17 1185
3 1 14.37 1.95 2.50 16.8 113 3.85 3.49 0.24 2.18 7.80 0.86 3.45 1480
4 1 13.24 2.59 2.87 21.0 118 2.80 2.69 0.39 1.82 4.32 1.04 2.93 735
In [221]: x = data.iloc[:,1:].values #13 atributos

y = data.iloc[:,0:1].values #classe (total = 3)
6 of 10 12/07/2018 10:07
In [222]: #parametros bandwidth e lambda

H = math.log(2,25)
L = 1e-05
In [223]: print("Bandwidth: "+str(H))

print("Lambda: "+str(L))
Bandwidth: 0.21533827903669653
Lambda: 1e-05
7 of 10 12/07/2018 10:07
In [224]: algorithm(x, H, L)
8 of 10 12/07/2018 10:07
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
9 of 10 12/07/2018 10:07
Out[224]: array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])
4.2.2 Base Wdbc
In [76]: data = pd.read_csv(folder+"datasets\\wdbc.data", header=None)

data.head()
Out[76]:
0 1 2 3 4 5 6 7 8 9 ... 22 23
0 842302 M 17.99 10.38 122.80 1001.0 0.11840 0.27760 0.3001 0.14710 ... 25.38 17.33 184.60
1 842517 M 20.57 17.77 132.90 1326.0 0.08474 0.07864 0.0869 0.07017 ... 24.99 23.41 158.80
2 84300903 M 19.69 21.25 130.00 1203.0 0.10960 0.15990 0.1974 0.12790 ... 23.57 25.53 152.50
3 84348301 M 11.42 20.38 77.58 386.1 0.14250 0.28390 0.2414 0.10520 ... 14.91 26.50 98.87
4 84358402 M 20.29 14.34 135.10 1297.0 0.10030 0.13280 0.1980 0.10430 ... 22.54 16.67 152.20
5 rows × 32 columns
In [77]: x = data.iloc[:,2:].values
y = data.iloc[:,0:1].values
In [78]: algorithm(x, 0.2, 1e-05)
Out[78]: array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])
4.2.3 Base Waveform
4.2.4 Base Two Norm
10 of 10 12/07/2018 10:07

Relatorio

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Relatorio

Încărcat de

Drepturi de autor:

Formate disponibile

Replicação Artigo file:///C:/Users/EMANUEL/AppData/Local/Temp/Replicação Artigo-1...

Aluno: Emanuel Ferreira

Projeto de Replicação de Artigo Científico - 1ª V.A.

Artigo: Distribution preserving learning for unsupervised feature selection

In [2]: #constantes e parametros

2. Implementação do Algoritmo DPFS

In [204]: def kernelGaussian(u):

In [205]: def distanciaEuclidiana(x1, x2):

In [206]: def kernelDensity(X, x, h):

In [208]: def distanciaEuclidianaChapeu(x1, x2, theta):

In [209]: def kernelDensityChapeu(X, x, h, alfa):

In [212]: def updateRule(x, y, theta, lambd):

In [213]: def derivadasGradiente(x, y, theta, lambd, ):

d2_22 = (b - c * (theta[j] ** 2)) ** 2

d2_3 = theta[j] * (b - c * (theta[j] ** 2))

In [215]: def convergenceTheta(theta, _theta):

In [216]: def algorithm(dataset, band, lambd):

while(not convergenceTheta(theta, _theta)):

In [217]: theta = np.zeros(13)

3. Implementação do Classificador (KNN) para testar a

4. Experimentos com as bases UCI - wine, wdbc, waveform,

4.1 Resultados do artigo

4.1.1 Numero de features selecionadas x Acurácia do classificador

4.1.2 Features x Pesos

4.2 Resultados da replicação

4.2.1 Base Wine

In [220]: data = pd.read_csv(folder+"datasets\\wine.data", header=None)

In [221]: x = data.iloc[:,1:].values #13 atributos

In [222]: #parametros bandwidth e lambda

In [223]: print("Bandwidth: "+str(H))

4.2.2 Base Wdbc

In [76]: data = pd.read_csv(folder+"datasets\\wdbc.data", header=None)

In [78]: algorithm(x, 0.2, 1e-05)

4.2.3 Base Waveform

4.2.4 Base Two Norm

S-ar putea să vă placă și

d2_22 = (b - c * (theta[j] 2)) 2