Bine ați venit la Scribd!

Săriți peste schemele de tip carusel

Quor A Presentation

Încărcat de

Vineeth Sagar

0% au considerat acest document util (0 voturi)

91 vizualizări14 pagini

Quora presentation

Titlu original

Quor a Presentation

Drepturi de autor

Formate disponibile

PDF, TXT sau citiți online pe Scribd

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Raportați acest document

Quora presentation

Drepturi de autor:

Formate disponibile

Descărcați ca PDF, TXT sau citiți online pe Scribd

Indicator pentru conținut neadecvat

0% au considerat acest document util (0 voturi)

91 vizualizări14 pagini

Quor A Presentation

Încărcat de

Vineeth Sagar

Quora presentation

Drepturi de autor:

Formate disponibile

Descărcați ca PDF, TXT sau citiți online pe Scribd

Indicator pentru conținut neadecvat

Salt la pagina

Sunteți pe pagina 1din 14

Căutați în document

Quora Question Pairs

Identify question pairs that have same intent

Arlene Fu
ENSC895 Course Project
Professor: Ivan Bajic
School of Engineering Science
Simon Fraser University
Burnaby, BC, Canada

1
Project Description

• Kaggle competition hold by Quora

• Finished 6 months ago
• Goal: Develop machine learning and natural language processing
system to classify whether question pairs are duplicates or no

2
Semantic Question Matching

What are the best ways to lose weight?

VS
What are effective weight loss plans?

3
Provided Data

Train.csv: 64MB with >400,000 pairs

id qid1 qid2 question1 question2 Is_duplicate
What is the step by What is the step by step
step guide to invest in guide to invest in share
0 1 2 share market in india? market? 0
… … … … … …

Test.csv: 314MB with >2.3 million pairs

test_id question1 question2
… … …

Evaluation: log loss =

4
PreProcessing

• Problems of input data

• Questions in training set : genuine examples from Quora (with typo)
• Questions in test set: computer-generated (Not make sense)

What food fibre?

• Correct typo
• Compare to DefaultDict
• Replace unreasonable words by a common word
• e.g. don’t è do not
• Tokenize
• Remove stopwords like and, also, to…
• Lemmatization: e.g. do, did, done à do

5
Features

• Word Embedding
• GloVe:
• Global Vectors for Word Representation
• Unsupervised learning algorithm for obtaining vector
representations for words
• Pre-trained word vector available

Reference: https://www.zhihu.com/question/32275069

6
Features

• Word Embedding
• Word match & share

7
Features

• Word Embedding
• Word match & share
• Magic feature provided by kagglers
• More frequent questions are more likely to be duplicates
• Count the neighbors of the question neighbors
• More common neighbors the question pair have, more likely to be
duplicate
• …

8
Models

• StratifiedKFold: (5 Fold)
• variation of k-fold which returns stratified folds
• preserving the percentage of samples for each class
• LSTM, currently logloss=0.13264 on public LB

9
LSTM model

10
Post Processing

11
Future work

• XGBoost
• optimized distributed gradient boosting library designed to be
highly efficient, flexible and portable
• LightGBM
• Sentence embedding
• More features

12
Reference

[1]. https://www.kaggle.com/sudalairajkumar/keras-starter-script-with-word-
embeddings/notebook
[2]. https://github.com/aerdem4/kaggle-quora-dup
[3]. https://www.kaggle.com/c/quora-question-pairs/discussion/32819
[4]. https://www.kaggle.com/dasolmar/xgb-with-whq-jaccard/code
[5]. https://www.kaggle.com/c/quora-question-pairs/discussion/31179
[6]. http://blog.csdn.net/lanxu_yy/article/details/29002543
[7]. https://nlp.stanford.edu/projects/glove/

13
Q&A

S-ar putea să vă placă și

Grit: The Power of Passion and Perseverance
De la Everand
Grit: The Power of Passion and Perseverance
Angela Duckworth
Evaluare: 4 din 5 stele
4/5 (588)
The Yellow House: A Memoir (2019 National Book Award Winner)
De la Everand
The Yellow House: A Memoir (2019 National Book Award Winner)
Sarah M. Broom
Evaluare: 4 din 5 stele
4/5 (98)
The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
De la Everand
The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
Mark Manson
Evaluare: 4 din 5 stele
4/5 (5795)
Never Split the Difference: Negotiating As If Your Life Depended On It
De la Everand
Never Split the Difference: Negotiating As If Your Life Depended On It
Chris Voss
Evaluare: 4.5 din 5 stele
4.5/5 (838)
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
De la Everand
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
Margot Lee Shetterly
Evaluare: 4 din 5 stele
4/5 (895)
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
De la Everand
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
Ben Horowitz
Evaluare: 4.5 din 5 stele
4.5/5 (345)
Shoe Dog: A Memoir by the Creator of Nike
De la Everand
Shoe Dog: A Memoir by the Creator of Nike
Phil Knight
Evaluare: 4.5 din 5 stele
4.5/5 (537)
Yes Please
De la Everand
Yes Please
Amy Poehler
Evaluare: 4 din 5 stele
4/5 (1891)
The Little Book of Hygge: Danish Secrets to Happy Living
De la Everand
The Little Book of Hygge: Danish Secrets to Happy Living
Meik Wiking
Evaluare: 3.5 din 5 stele
3.5/5 (400)
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
De la Everand
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
Ashlee Vance
Evaluare: 4.5 din 5 stele
4.5/5 (474)
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
De la Everand
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
Dave Eggers
Evaluare: 3.5 din 5 stele
3.5/5 (231)
On Fire: The (Burning) Case for a Green New Deal
De la Everand
On Fire: The (Burning) Case for a Green New Deal
Naomi Klein
Evaluare: 4 din 5 stele
4/5 (74)
The Emperor of All Maladies: A Biography of Cancer
De la Everand
The Emperor of All Maladies: A Biography of Cancer
Siddhartha Mukherjee
Evaluare: 4.5 din 5 stele
4.5/5 (271)
Angela's Ashes: A Memoir
De la Everand
Angela's Ashes: A Memoir
Frank McCourt
Evaluare: 4.5 din 5 stele
4.5/5 (440)
Bad Feminist: Essays
De la Everand
Bad Feminist: Essays
Roxane Gay
Evaluare: 4 din 5 stele
4/5 (1016)
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
De la Everand
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
Gilbert King
Evaluare: 4.5 din 5 stele
4.5/5 (266)
The Unwinding: An Inner History of the New America
De la Everand
The Unwinding: An Inner History of the New America
George Packer
Evaluare: 4 din 5 stele
4/5 (45)
Team of Rivals: The Political Genius of Abraham Lincoln
De la Everand
Team of Rivals: The Political Genius of Abraham Lincoln
Doris Kearns Goodwin
Evaluare: 4.5 din 5 stele
4.5/5 (234)
Principles: Life and Work
De la Everand
Principles: Life and Work
Ray Dalio
Evaluare: 4 din 5 stele
4/5 (599)
Fear: Trump in the White House
De la Everand
Fear: Trump in the White House
Bob Woodward
Evaluare: 3.5 din 5 stele
3.5/5 (738)
The World Is Flat 3.0: A Brief History of the Twenty-first Century
De la Everand
The World Is Flat 3.0: A Brief History of the Twenty-first Century
Thomas L. Friedman
Evaluare: 3.5 din 5 stele
3.5/5 (2259)
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
De la Everand
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
Brené Brown
Evaluare: 4 din 5 stele
4/5 (1091)
Steve Jobs
De la Everand
Steve Jobs
Walter Isaacson
Evaluare: 4.5 din 5 stele
4.5/5 (806)
Rise of ISIS: A Threat We Can't Ignore
De la Everand
Rise of ISIS: A Threat We Can't Ignore
Jay Sekulow
Evaluare: 3.5 din 5 stele
3.5/5 (137)
John Adams
De la Everand
John Adams
David McCullough
Evaluare: 4.5 din 5 stele
4.5/5 (2409)
The Glass Castle: A Memoir
De la Everand
The Glass Castle: A Memoir
Jeannette Walls
Evaluare: 4.5 din 5 stele
4.5/5 (1713)
The Outsider: A Novel
De la Everand
The Outsider: A Novel
Stephen King
Evaluare: 4 din 5 stele
4/5 (1839)
Brooklyn: A Novel
De la Everand
Brooklyn: A Novel
Colm Toibin
Evaluare: 3.5 din 5 stele
3.5/5 (1937)
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
De la Everand
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
Viet Thanh Nguyen
Evaluare: 4.5 din 5 stele
4.5/5 (121)
The Light Between Oceans: A Novel
De la Everand
The Light Between Oceans: A Novel
M.L. Stedman
Evaluare: 4.5 din 5 stele
4.5/5 (789)
The Art of Racing in the Rain: A Novel
De la Everand
The Art of Racing in the Rain: A Novel
Garth Stein
Evaluare: 4 din 5 stele
4/5 (4200)
Manhattan Beach: A Novel
De la Everand
Manhattan Beach: A Novel
Jennifer Egan
Evaluare: 3.5 din 5 stele
3.5/5 (792)
The Woman in Cabin 10
De la Everand
The Woman in Cabin 10
Ruth Ware
Evaluare: 3.5 din 5 stele
3.5/5 (2322)
The Perks of Being a Wallflower
De la Everand
The Perks of Being a Wallflower
Stephen Chbosky
Evaluare: 4.5 din 5 stele
4.5/5 (2104)
Wolf Hall: A Novel
De la Everand
Wolf Hall: A Novel
Hilary Mantel
Evaluare: 4 din 5 stele
4/5 (3811)
A Man Called Ove: A Novel
De la Everand
A Man Called Ove: A Novel
Fredrik Backman
Evaluare: 4.5 din 5 stele
4.5/5 (4610)
Little Women
De la Everand
Little Women
Louisa May Alcott
Evaluare: 4 din 5 stele
4/5 (104)
Sing, Unburied, Sing: A Novel
De la Everand
Sing, Unburied, Sing: A Novel
Jesmyn Ward
Evaluare: 4 din 5 stele
4/5 (1103)
A Tree Grows in Brooklyn
De la Everand
A Tree Grows in Brooklyn
Betty Smith
Evaluare: 4.5 din 5 stele
4.5/5 (1929)
Her Body and Other Parties: Stories
De la Everand
Her Body and Other Parties: Stories
Carmen Maria Machado
Evaluare: 4 din 5 stele
4/5 (821)
The Constant Gardener: A Novel
De la Everand
The Constant Gardener: A Novel
John le Carré
Evaluare: 3.5 din 5 stele
3.5/5 (104)
Hammerhead Shark Detection Using Regions With Convolutional Neural Networks
Document6 pagini
Hammerhead Shark Detection Using Regions With Convolutional Neural Networks
Felipe Neira Díaz
Încă nu există evaluări
Application of Deep Learning For Object Detection
Document12 pagini
Application of Deep Learning For Object Detection
yadavghyam2001
Încă nu există evaluări
SPECTF
Document2 pagini
SPECTF
Aditya Nemani
Încă nu există evaluări
What Is Supervised Machine Learning
Document3 pagini
What Is Supervised Machine Learning
Muhammad Tehseen Qureshi
Încă nu există evaluări
DSE With DL & Photomacthing - v1.0
Document11 pagini
DSE With DL & Photomacthing - v1.0
Sandip Banerjee
Încă nu există evaluări
Pindi Gowtham Offcampus Resume
Document1 pagină
Pindi Gowtham Offcampus Resume
pindi.gowtham797
Încă nu există evaluări
Project-PPT-Sign Language
Document10 pagini
Project-PPT-Sign Language
sudhakar5472
67% (3)
Final Report
Document20 pagini
Final Report
Abhishek
Încă nu există evaluări
Segmap: Segment-Based Mapping and Localization Using Data-Driven Descriptors
Document18 pagini
Segmap: Segment-Based Mapping and Localization Using Data-Driven Descriptors
xingwu ji
Încă nu există evaluări
Object Detection and Currency Recognition Using CNN
Document6 pagini
Object Detection and Currency Recognition Using CNN
Natasha Advani
Încă nu există evaluări
Hidden Markov Model in Automatic Speech Recognition
Document29 pagini
Hidden Markov Model in Automatic Speech Recognition
vipkute0901
Încă nu există evaluări
Paper-189 - Machine Learning Unveiled
Document19 pagini
Paper-189 - Machine Learning Unveiled
jhalakbhardwaj20
Încă nu există evaluări
Applied Electronics and Instrumentation
Document192 pagini
Applied Electronics and Instrumentation
SHINUMM
Încă nu există evaluări
Ayush Et Al. - 2020 - Geography-Aware Self-Supervised Learning
Document11 pagini
Ayush Et Al. - 2020 - Geography-Aware Self-Supervised Learning
Pablo Fernando Cuenca Lozano
Încă nu există evaluări
Sign Language Recognition Using Deep Learning and Computer Vision
Document6 pagini
Sign Language Recognition Using Deep Learning and Computer Vision
GABRIEL NICODEMO ESPINOZA HUAM�N
Încă nu există evaluări
Artificial Intelligence (AI) : Amina Irizarry-Nones Anjali Palepu Merrick Wallace
Document14 pagini
Artificial Intelligence (AI) : Amina Irizarry-Nones Anjali Palepu Merrick Wallace
Irfan Ahmed
Încă nu există evaluări
Heart Failure Prediction Using ANN
Document13 pagini
Heart Failure Prediction Using ANN
SHIVANSH KASHYAP (RA2011003010988)
Încă nu există evaluări
Siamese Network: Shusen Wang
Document51 pagini
Siamese Network: Shusen Wang
MInh Thanh
Încă nu există evaluări
Pattern Recognition Linear Classifier by Zaheer Ahmad
Document37 pagini
Pattern Recognition Linear Classifier by Zaheer Ahmad
Zaheer Ahmad
0% (1)
Data Science Interview Preparation (# DAY 22)
Document16 pagini
Data Science Interview Preparation (# DAY 22)
ThànhĐạt Ngô
Încă nu există evaluări
A Survey of Convolutional Neural Networks Analysis
Document22 pagini
A Survey of Convolutional Neural Networks Analysis
imrizwanali
Încă nu există evaluări
Anomaly Detection in Self-Organizing Networks - Conventional Versus Contemporary Machine Learning
Document9 pagini
Anomaly Detection in Self-Organizing Networks - Conventional Versus Contemporary Machine Learning
Sudhakar HOD i/c
Încă nu există evaluări
Artificial Intelligence Syllabus (2 Hr./week) : Course Schedule Weeks Subject Sub-Subjects
Document2 pagini
Artificial Intelligence Syllabus (2 Hr./week) : Course Schedule Weeks Subject Sub-Subjects
Goran Wnis
Încă nu există evaluări
Unit 4
Document57 pagini
Unit 4
Poongodi
100% (3)
Keras Cheat Sheet Python For Data Science: Model Architecture Inspect Model
Document1 pagină
Keras Cheat Sheet Python For Data Science: Model Architecture Inspect Model
Minh Nguyễn
Încă nu există evaluări
Text Detection and Character Recognition in Scene Images With Unsupervised Feature Learning
Document6 pagini
Text Detection and Character Recognition in Scene Images With Unsupervised Feature Learning
petersonjr
Încă nu există evaluări
AI Tools
Document2 pagini
AI Tools
Suresh
Încă nu există evaluări
An Unsupervised Deep Domain Adaptation Approach For Robust Speech Recognition PDF
Document12 pagini
An Unsupervised Deep Domain Adaptation Approach For Robust Speech Recognition PDF
Kadek Maydayanti
Încă nu există evaluări
Ch. 10: Introduction To Convolution Neural Networks CNN and Systems
Document69 pagini
Ch. 10: Introduction To Convolution Neural Networks CNN and Systems
faisal
Încă nu există evaluări
Final Assessment
Document6 pagini
Final Assessment
Raviteja M
Încă nu există evaluări