Sunteți pe pagina 1din 15

Distributed SPARQL query engine

using MapReduce

Prasad Kulkarni,
University o !dinberg


"ontents

#b$ective

%ena ARQ

MapReduce or SPARQL query

!&peri'ents

Discussion


#b$ective
(ssue ) Present i'ple'entations o a SPARQL engine are incapable o
taking e&tre'e data set loads*

SP+,enc- SPARQL query benc-'ark reports, query on


+. 'illion triple dataset takes /00)/000sec*
#b$ective ) 1o peror' distributed SPARQL query e&ecution on a
MapReduce cluster 2-ic- outperor's t-e response ti'es and scalability
benc-'arks provided by SP+,enc-*


%ena ARQ

%ena is a %ava ra'e2ork or building Se'antic 3eb applications* %ena


provides a collection o tools and %ava libraries to -elp you to develop
se'antic 2eb and linked)data apps, tools and servers*

%ena ARQ 4 SPARQL like query engine*

(n)'e'ory i'ple'entation 5 s'all si6e data

RD7 data stored in -as-'ap data structure 2it-


inde&es on SP#*

1D, i'ple'entation 5 large si6e data

,8 tree (nde&ing*

Query opti'i6ation 4 7i&ed, Statistics based*




Query !ngine


Distributed Design Approac-
Distributed SPARQL !ngine "onceptual Model
1-e evaluation plan -as a tree structure and t-e seriali6ation and de)seriali6ation o t-is co'ple&
structure can be ti'e consu'ing*


Distributed SPARQL !ngine Reality Model


Algorit-'

Selection P-ase 4 7ilters input RD7 data*

%oin P-ase 4 Run i t-e grap- pattern -as $oins*




Query
Selection P-ase


%oin P-ase
Query


!&peri'ental !valuation
Q/) $oins Q9a,b,c ) ilters
Q/0 #n t2o variables

Ra2 ('ple'entation test


Docu'ent inde& #pti'i6ation

Speed test

Scalability 1est


"o'parison and "onclusion

"o'parison 2it- SP+,enc-

#nly Q9c peror's 2ell 5: (nde&ing 2ins over parallel


co'putation*
; Q3a flters 92.61% of all the articles. Q3b flters 0.65% of the articles
and t-e in query Q9c t-e 7(L1!R is not satisied by any triples, and t-e result is an
e'pty set*

"onclusion

Syste' scaled 2ell but response ti'e 2ere unacceptable*

7uture 3ork

Distributed (nde&ing*

S-ar putea să vă placă și