Sunteți pe pagina 1din 33

Mead learning Big data & deep learning projects

1.B2B Recommendation (similar to Indiamart)


Dataset : weblog file with approximate 700000 searching

queries of different users in different categories of B2B


search engine.
Each search contains following data in weblog file:

date time s-computername s-ip cs-method cs-uri-stem cs-uriquery s-port cs-username c-ip cs(User-Agent) cs(Cookie)
cs(Referer) cs-host sc-status sc-substatus sc-win32-status
sc-bytes cs-bytes time-taken
2

Tools and Technology


Dataset : Weblog of Fibre2Fashion domain
Technology : JAVA
A lgorithm: Collaborative Deep Learning for

Recommendation Systems

B2B search engine

Cs-uri-query

Weblog Analysis

Unique page visitor data with timestamp

Unique Page in
weblog

Visitor client address


Unique Client IP in weblog

Recommendation

Server
IP

Client IP

Pag
e

Query

Time
taken

This figure shows the highest time spend pages by the


clients.

2.Pingax Recommendation
We created a Google chrome plug-in with

recommendation system which gives the best


recommendation as per the user search for all the items
in different websites.
Technology: Java, Apache Mahout, Apache Hadoop, Json,

JavaScript

Pingax Digital Recommendation Settings

10

Recommendation

1st
Recommendation

11

Recommendation

2nd Recommendation

12

Recommendation

Recommended
item

13

3.IMDB movie review classification


Dataset: IMDB movie Review text files
Tools & technology: Eclipse , JAVA
Machine Learning & Deep Learning Technique: Deep

Learning using Linear Support Vector Machines and


Conditional Random Fields as Recurrent Neural
Networks
Class labels: Positive , Negative , Neutral
14

Dataset

POSITIVE

NEGATIVE
15

SVM train

Training Parameters

16

Accuracy

17

CRF Train

CRF Train parameters


18

CRF Test

CRF Test parameters

19

Accuracy

20

4.A Collaborative Approach for Web Personalized


Recommendation System
Dataset: Movielens data
Technology: JAVA
Algorithm: User based collaborative Deep Learning

filtering technique .

21

Dataset
The full data set, 100000 ratings by 943 users
on
1682 items.
Each user has rated at least 20 movies.
Users and items are numbered consecutively
from 1.
The data is randomly ordered and 80%/20%
splits
of the data into training and test data.
User ID, Item(Movie) Id, Rating

22

Evaluation of System

23

Analysis with different similarity measures


With Preprocessing
Neighbour
Pearson
Coefficient

Euclidean
Distance

Log
Likely-hood

Tanimoto
Coefficient

Without preprocesing

20
SCORE:
0.80281969
RMSE:
0.807538041

100
SCORE:
0.82893594
RMSE:
0.83615161

1000
SCORE:
0.9604098849
RMSE:
0.957224878

20
SCORE:
0.9146494635
RMSE:
0.882693501

100
1000
SCORE:
SCORE:
0.843550186 0.86570804
RMSE:
RMSE:
0.8463632529 0.86579376

SCORE:
0.7694310985
RMSE:
0.767654578
SCORE:
0.795861932
RMSE:
0.7785749268
SCORE:
0.7945015649
RMSE:
0.7798661428

SCORE:
0.7399951798
RMSE:
0.746315315
SCORE:
0.7467147307
RMSE:
0.759681185
SCORE:
0.7490188732
RMSE:
0.76143536582

SCORE:
0.724689055
RMSE:
0.740014834
SCORE:
0.726946821
RMSE:
0.74111319
SCORE:
0.7290753292
RMSE:
0.74004491

SCORE:
0.89754363
RMSE:
0.900235117
SCORE:
0.823518133
RMSE:
0.82825220
SCORE:
0.832473341
RMSE:
0.8394239511

SCORE:
0.83637523
RMSE:
0.8443356
SCORE:
0.8034275434
RMSE:
0.80880505
SCORE:
0.8064418071
RMSE:
0.813111737

SCORE:
0.89751163
RMSE:
0.90023111
SCORE:
0.805596293
RMSE:
0.814149532
SCORE:
0.8037747631
RMSE:
0.811900568

24

5.Flower grain image classification using supervised


classification algorithms
Dataset: Magnified images of flowers
Technology: JAVA
Algorithm: Grain Analysis (Image Processing)
Machine learning Technique: Neural Network and Deep

Belief network ,SVM


Microscope Magnification: 100X
25

Deep Neural Network


Here we extended simple neural network architecture to
deep belief network.
Simple Neural Network

Deep Belief Network

26

Flower and its Grain images

27

Model Parameter
SVM training Parameters

28

Code & Accuracy

29

6. Large scale medical text classification and


identification in Healthcare
Dataset : Medical Text files

30

healthcare
ENTITY RELATIONSHIP DETECTION FROM LARGE TEXT FILES
In this project, we developed algorithm that will predict if a relationship
exists between two entities from medical text files. (Like Leg pain or pain in
leg). We used deep learning Support Vector Machine algorithm (Binary
Classifier) to accurately identify it.
Accuracy: 87% on testing (unknown) data

31

Healthcare
ENTITY DETECTION IN CLINICAL DOMAIN
In this Project, we detected different keywords (modifiers) like Negation,
Conditional, Severity, Temporal, Body measurements, some Disease name
and others from large medical text files using NLP Algorithms and classified
it using probabilistic graphical model like Deep CRF networks and Hidden
Markov Model.
Accuracy: 93% on testing (unknown) data.

32

7.Neural network design for rock image


recognition
The objective of this project is to develop the method for Rock Image
Classification system using microscopic imaging of surface parameter.
Rock surface parameters are color,grain and texture. The combined
feature extracted from each of this parameter is used to uniquely identify
rock type or to recognize its signature. We designed and developed multi
layer feed-forward deep neural network to classified non-linear complex
data.


Accuracy: 95% on testing untrained rock images

33

S-ar putea să vă placă și