Sunteți pe pagina 1din 23

M A P /R ED U C E IN C O U C H D B

<- watch the race car


Oliver Kurowski, @okurow

Facts about M ap/Reduce


Programming paradigm, popularized and patented by Google
Great for parallel jobs
No Joins between documents
In CouchDB: Map/Reduce in JavaScript (default)
Also Possible with other languages

Workflow
1. Map function builds a list of key/value pairs
2. Reduce function reduces the list ( to a single Value)

Oliver Kurowski, @okurow

Sim ple M ap Exam ple


A List of Cars
Id: 1
make: Audi
model: A3
year: 2000
price: 5.400

Id: 2
make: Audi
model: A4
year: 2009
price: 16.000

Id: 3
make: VW
model: Golf
year: 2009
price: 15.000

Id: 4
make: VW
model: Golf
year: 2008
price: 9.000

Step 1: Make a list, ordered by Price


Function(doc) {
emit (doc.price, doc.id);
}

Key

Step 2: Result:

Valu
e

Key
,
Value
5.400 , 1
9.000 , 4
12.000 , 5
15.000 , 3
16.000 , 2
Oliver Kurowski, @okurow

Id: 5
make: VW
model: Polo
year: 2010
price: 12.000

Q uerying M aps
Original Map

Key
,
Value
5.400 , 1
9.000 , 4
12.000 , 5
15.000 , 3
16.000 , 2

startkey=10.000 & endkey=15.500


Key
,
Value
12.000 , 5
15.000 , 4

key=10.000

Key
Value

endkey=10.000

Key
,
Value
5.400 , 1

All keys
from
10.000 to
< 15.500

Exact
key, so
no result

All keys,
less than
10.000

Oliver Kurowski, @okurow

M ap Function
Has one document as input
Can emit all JSON-Types as key and value:

- Special Values: null, true, false


- Numbers:
1e-17, 1.5, 200
- Strings : +, 1, Ab, Audi
- Arrays:

[1], [1,2], [1,Audi,true]

- Objects: {price:1300,sold:true}
Results are ordered by key ( or revers)

(order with mixed types: see above)


In CouchDB: Each result has also the doc._id
{"total_rows":5,"offset":0,
"rows":[
{"id":"1","key":"Audi","value":1},
{"id":"2","key":"Audi","value":1},
{"id":"3","key":"VW","value":1},
{"id":"4","key":"VW","value":1},
{"id":"5","key":"VW","value":1} ]
}

Oliver Kurowski, @okurow

Reduce Function
Has arrays of keys and values as input
Should reduce the result of a map to a single value
Javascript (Other languages possible)
In CouchDB: some simple built-in native erlang functions

(_sum,_count,_stats)
Is automaticaly called after the map-function has finished
Can be ignored with reduce=false
Is needed for grouping

Oliver Kurowski, @okurow

Sim ple M ap /Reduce Exam ple


A List of Cars
Id: 1
make: Audi
model: A3
year: 2000
price: 5.400

Id: 2
make: Audi
model: A4
year: 2009
price: 16.000

Id: 3
make: VW
model: Golf
year: 2009
price: 15.000

Id: 4
make: VW
model: Golf
year: 2008
price: 9.000

Step 1: Make a map, ordered by make


Function(doc) {
emit (doc.make, 1);
}

Key

Result:

Valu
e=1

Key
,
Value
Audi , 1
Audi , 1
VW,
1
VW,
1
VW,
1

Oliver Kurowski, @okurow

Id: 5
make: VW
model: Polo
year: 2010
price: 12.000

Sim ple M ap/R educe Exam ple


Result:

Key
Value
Audi ,
Audi ,
VW ,
VW ,
VW ,

,
1
1
1
1
1

Step 2: Write a sum-reduce


function(keys,values
){
return
sum(values);
}

Result:

Key
,
Value
null
, 5

Oliver Kurowski, @okurow

Sim ple M ap/Reduce Exam ple


Step 3: Querying

- key=Audi

Key
,
Value
null , 2

Step 4: Grouping by keys

- group=true

Key
,
Value
Audi , 2
VW , 3

Step 5: Use only the map Function

- reduce=false

Key
Value
Audi ,
Audi ,
VW ,
VW ,
VW ,

,
1
1
1
1
1

Like
having
no
reducefunction

Oliver Kurowski, @okurow

A rray-K ey M ap/Reduce
A List of cars (again)
Exam ple
Id: 1
make: Audi
model: A3
year: 2000
price: 5.400

Id: 2
make: Audi
model: A4
year: 2009
price: 16.000

Id: 3
make: VW
model: Golf
year: 2009
price: 15.000

Id: 4
make: VW
model: Golf
year: 2008
price: 9.000

Step 1: Make a map, with array as key


Function(doc) {
emit
([doc.make,doc.model,doc.year],
1);
}

Result (with group=true):

Key
, Value
[Audi, A3, 2000]
1
[Audi, A4, 2009]
1
[VW, Golf, 2008]
[VW, Golf, 2009]
[VW, Polo, 2010]

,
,
,1
,1
,1

Oliver Kurowski, @okurow

Id: 5
make: VW
model: Polo
year: 2010
price: 12.000

A rray-K ey M ap/Reduce
startkey=[Audi]
Q uer
ying
( &group=true)
startkey=[VW]

( &group=true)

endkey=[VW]

(&group=true)

Key
, Value
[Audi, A3, 2000] ,
1
[Audi, A4, 2009] ,
1
[VW, Golf, 2008] ,
1
[VW,
Key Golf, 2009] ,
1, Value
[VW, Polo,
2010], ,
[Audi,
A3, 2000]
1
[Audi, A4, 2009] ,
1
[VW, Golf, 2008] ,
1
[VW, Golf, 2009] ,
Key
1
, Value
[VW, Polo, 2010] ,
[Audi, A3, 2000] ,
1
1
[Audi, A4, 2009] ,
1
[VW, Golf, 2008] ,
1
[VW, Golf, 2009] ,
1
[VW, Polo, 2010] ,
1

Oliver Kurowski, @okurow

Remembe
r:Endkey
is not in
resultlist

Array-Key M ap/Reduce Ranges


Step 4: Range queries:

- startkey=[VW,Golf]
- endkey= [VW,Polo]
- (&group=true)

Key
, Value
[Audi, A3, 2000] ,
1
[Audi, A4, 2009] ,
1
[VW, Golf, 2008]
, 1
[VW, Golf, 2009]
, 1
[VW, Polo, 2010] ,
1
Key
, Value

What, if we do not know the next model after Golf ?

- startkey=[VW,Golf]
- endkey=[VW,Golf,99999]
- (&group=true)

[Audi, A3, 2000] ,


1
[Audi, A4, 2009] ,
1
[VW, Golf, 2008]
, 1
[VW, Golf, 2009]
, 1
[VW, Polo, 2010] ,
1

- better: endkey=[VW,Golf,{}]

Oliver Kurowski, @okurow

G rouping w ith group_level


group=true

(aka group_level=exact)

group_level=1

(no group=true needed)


group_level=2

(no group=true needed)


group_level=3

Key
, Value
[Audi, A3, 2000] ,
1
[Audi, A4, 2009] ,
1
[VW, Golf, 2008] ,
1
[VW, Golf, 2009] ,
1
Key
, Value
[VW,
Polo,
[Audi] , 2 2010] ,
1
[VW] , 3

Key
, Value
[Audi, A3] , 1
[Audi, A4] , 1
[VW, Golf] , 2
[VW, Polo] , 1

-> group_level=exact -> group=true

Oliver Kurowski, @okurow

Exam ples:
Get all car makes:

- group_level=1

Key
, Value
[Audi] , 2
[VW] , 3

Get all models from VW:

- startkey=[VW]&endkey=[VW,{}]&group_level=2

Get all years of VW Golf:

Key
,
Value
[VW, Golf] , 2
[VW, Polo] , 1

- startkey=[VW,Golf]&endkey=[VW,Golf,
{}]&group_level=3
Key
, Value
[VW, Golf, 2008] ,
1
[VW, Golf, 2009] ,
1

Oliver Kurowski, @okurow

Reduce /Rereduce:
A rule to use reduce-functions:

The input of a reduce function does not only accept


the
result
a Key
map,
but also
the result of itself
Function(doc)of
{
,
function(keys,val
emit
(doc.make,1);
}

Value
Audi , 2
VW
, 3

ues) {
return
sum(values);
}

Key ,
Value
null , 5

Why ?
A reduce function can be used more than just once

If the map is too large, then it will be split and each


part runs through the reduce function, finally all the
results run through the same reduce function again.
Oliver Kurowski, @okurow

W TF ?
Oliver Kurowski, @okurow

Reduce /Rereduce:
Example for counting values( Will produce wrong result !)
function(keys,values
){
return
count(values);
}

Key ,
Value
1
, 1
2
, 10
3
, 4

999 , 7
1000 , 12

Key ,
Value
1
,
1
2
,
10

Key
,
333
Value ,
23
334
,
15
335
,
99

Key
,
666
,
Value
82
667
,
18
668
,
149

Split
1000 ,
12

function(keys,values
){
return
count(values);
}

Key ,
Value
null
,
333

function(keys,values
){
return
count(values);
}

Key ,
Value
null
,
333

function(keys,values
){
return
count(values);
}

Key ,
Value
null
,
333

function(keys,value
s) {
return
count(values);
}

Oliver Kurowski, @okurow

Key ,
Value
null
,
3

Boom !
3 != 1000

Reduce /Rereduce:
Solution: The rereduce-Flag (not mentioned yet)

- indicates, wether the function is called first or not. Set by


function(keys ,values,
CouchDB
rereduce) {
if(rereduce==false) {
return count(values);
}else{
return sum(values);

Key ,
Value
1
, 1
2
, 10
3
, 4

999 , 7
1000 , 12

Key ,
Value
1
,
1
2
,
10

Key
,
333
,
Value
23
334
,
15
335
,
99

Key
,
666
Value ,
82
667
,
18
668
,
149
Split
1000 ,

if(rereduce==false
){
return
count(values);

Key ,
Value
null
,
333

if(rereduce==false
){
return
count(values);

Key ,
Value
null
,
333

if(rereduce==false
){
return
count(values);

Key ,
Value
null
,
334

rereduce=false

else{
return
sum(values)
}

Key ,
Value
null
,
1000

Correct

rereduce=true

Oliver Kurowski, @okurow

Input ofa reduce function:


The map:

The function:

Doc._id , Key
,
Value
4
, Audi
,
12.000
2
, BMW
,
20.000
1
, Citroen ,
9.000
function(keys
3
, ,values,
Dacia
,
rereduce)
{
6.500
return sum(values);

Input Values 1 (rereduce=false):

- keys:
- values:
- rereduce:

[ [Audi,4],[BMW,2],[Citroen,1],
[Dacia,3] ]
[ 12.000,20.000,9.000,6.
500]
fals
e

Input Values 2 (rereduce=true):

- keys:
- values:
- rereduce:

null
[47.50
0]
true

Oliver Kurowski, @okurow

W here does M ap/Reduce live ?


Map/Reduce functions are stored in a design document

in the views key:


{
_id:_design/example,
views: {
simplereduce: {
map: function(doc)
{ emit(doc.make,1); },
reduce: function (keys, values)
{ return sum (values); }
}
}
}

Map/reduce functions start when a view is called:


http://localhost:5984/mapreduce/_design/example/_view/simplereduce
http://localhost:5984/mapreduce/_design/example/_view/simplereduce?
key=Audi
http://localhost:5984/mapreduce/_design/example/_view/simplereduce?
key=VW&group=true

Oliver Kurowski, @okurow

View calling
All documents in the database are called by a view once
After the first call: Only new and changed docs are called by

the function when calling the view again


The results are stored in CouchDB internal B+tree
The result, that you receive is the stored B+tree result

That means: If a view is called first, it could take a little time to


build the tree before you get the results.
If there are no changes to docs, the next time you call, the
result is presented instantly
Key queries like startkey and endkey are performed on the

B+tree result, no rebuild needed


There are serveral parameters for calling a view:

limit, skip, include_docs=true, key, startkey, endkey,


descending, stale(ok,update_after),group, group_level, reduce
(=false)
Oliver Kurowski, @okurow

View calling param eters


limit: limits the output
skip: skips a number of documents

include_docs=true: when no reduce, docs are sent with the


map-list

key, startkey,endkey: should be known now


startkey_docid=x: only docs with id>=x
endkey_docid=x: only docs with id<x
descending=true: reverse order. When using start/endkey, they

must be changed
Stale=ok: do not start indexing, just deliver the stored result
Stale=update_after: deliver old results, start indexing after that
Group, group_level,reduce=false: should be known

Oliver Kurowski, @okurow

Youve m ade it !

Oliver Kurowski, @okurow