Documente Academic
Documente Profesional
Documente Cultură
2/15
Dataset
Label
1
0
0
I1
3
7
12
I2
20
91
73
I13
2741
1157
1844
C1
68fd1e64
3516f6e6
05db9164
..
.
C2
80e26c9b
cfc86806
38a947a1
C26
4cf72387
796a1a2e
5d93f8ab
62
1457
68fd1e64
cfc86806
cf59444f
#Train:
#Test:
#Features after one-hot encoding:
3/15
45M
6M
33M
Evaluation
1X
yi log yi + (1 yi ) log (1 yi ),
logloss =
L
i=1
4/15
5/15
Flowchart
6/15
Pre-A
nnz=13-39
feat=39
GBDT
nnz=30
feat=30 27
Pre-B
nnz=
69
feat
=10 6
CSV
Rst
Calib.
FFM
nnz means the number of non-zero elements of each impression; feat represents
the size of feature space.
Preprocessing-A
7/15
8/15
9/15
Example: Assuming that we have already trained GBDT with 3 trees with depth 2.
We feed an impression x into these trees. The first tree thinks x belong to node 4, the
second node 7, and the third node 6. Then we generate the feature 1:4 2:7 3:6 for
this impression.
x
1
2
4
3
5
1:4
2
7
3
5
2:7
2
7
3
5
3:6
Preprocessing-B
10/15
Hashing Trick
11/15
text
hash function
hash value
mod 106
feature
I1:3
739920192382357839297
839297
C1-68fd1e64
839193251324345167129
167129
GBDT1:173
923490878437598392813
392813
12/15
Calibration
13/15
Running Time
14/15
Time (min.)
8
29
38
100
1
176
Memory (GB)
0
15
0
16
0
15/15
Method
LR-Poly2
FFM
FFM + GBDT
FFM + GBDT (v2)
FFM + GBDT + calib.
FFM + GBDT + calib. (v2)
Public
0.44984
0.44613
0.44497
0.44474
0.44488
0.44461
Private
0.44954
0.44598
0.44483
0.44462
0.44479
0.44449