Sunteți pe pagina 1din 28

Protocol Buffers

Portable binary serialization,


the Google way

What is protobuf?
Protocol Buffers defines two things:
A compact binary serialization format
(pb)
A text-based descriptor language (.proto)
Implementation specific:
Runtime serialization library / code
.proto parser / code generator

What is a .proto?
Descriptor language...
message Test1 {
required int32 a = 1;
}
message Test3 {
optional Test1 c = 3;
}
service SearchService {
rpc Search (SearchRequest) returns (SearchResponse);
}

Comparison of protocols

Using Internet data in Android applications by Michael Galpin, IBM 2010,


http://www.ibm.com/developerworks/opensource/lib
rary/x-dataAndroid/index.html

Proto Message
message GeoPairMapEntry {
optional GeoPair(FieldType) key = 1;
repeated GeoPair geoPair = 2;
optional(Field Rules) bool hasZip4 = 3 [deprecated=true];
optional GeoConversionStatus status=4;
repeated string missingGeoType = 5(Unique Tags);
}

Some Proto Options


[packed=true] : Repeated fields of scalar numeric types
aren't encoded as efficiently as they could be. New code
should use the special option [packed=true] to get a more
efficient encoding.
message GeoPairMapEntry {
optional GeoPair(FieldType) key = 1;
repeated GeoPair geoPair = 2;
optional(Field Rules) bool hasZip4 = 3 [deprecated=true];
optional GeoConversionStatus status=4;
repeated string missingGeoType = 5(Unique Tags);
}

A Proto Rule
Use Tag Numbers 1 through 15

Updating A Message Type


Don't change the numeric tags for
any existing fields.
Any new fields that you add should
be optional or repeated
Non-required fields can be removed,
as long as the tag number is not
used again in your updated message
type.

If Deleting a Field
Reserved Tags :
message Foo
{
reserved 2, 15, 9 to 11; reserved "foo",
"bar";
}

Data Types
int32, uint32, int64, uint64, and bool are
all compatible
sint32 and sint64 are compatible with
each other but are not compatible with
the other integer types.
optional is compatible with repeated
Changing a default value is generally OK,
as long as you remember that default
values are never sent over the wire.

Default Value
For bools, the default value is false.
For numeric types, the default value
is zero.

Maps
map<string, Project> projects = 3;
The map syntax is equivalent to the following
on the wire
message MapFieldEntry {
key_type key = 1;
value_type value = 2;
}

repeated MapFieldEntry map_field = N;


So protocol buffers implementations that do not
support maps can still handle your data.

Problem we faced
enum GeoType {
DEFAULT_UNUSED = 0;
// Numbers for Mgrs representing Radius
MGRS_10 = 10;
MGRS_100 = 100;
MGRS_1000 = 1000;
//Reserving 10001 to 10050 for other GeoTypes
ZIP4 = -1;
ATZ = -2;
}
For enums, the default value is the first value listed in the enum's type definition.
This means care must be taken when adding a value to the beginning of an enum
value list.
Since enum values use varint encoding on the wire, negative values are inefficient
and thus not recommended.

Importing Other Protobuf


You can use definitions from other
.proto files by importing them. To
import another .proto's definitions,
you add an import statement to the
top of your file.
By default you can only use
definitions from directly imported
.proto files.

Example

Compiling Proto Files


Dufffman Proto
import "m6model.proto";
import "dsdeviceidlist.proto";
import "geoData/GeoData.proto";

protoc -I=common-protobuf-model/src/main/resources

Compiling Proto Files


Dufffman Proto
import "m6model.proto";
import "dsdeviceidlist.proto";
import GeoData.proto";

protoc -I=common-protobuf-model/src/main/resources
I=common-protobuf-model/src/main/resources/geoData

Encoding
Varints : Varints are a method of serializing
integers using one or more bytes. Smaller
numbers take a smaller number of bytes.
varints store numbers with the least
significant group first
Each byte in a varint, except the last byte, has
the most significant bit (msb) set this
indicates that there are further bytes to come.
Example : 1 00000001
300 - 10101100 00000010

Encoding
protocol buffer message is a series of
key-value pairs.
When a message is encoded, the
keys and values are concatenated
into a byte stream.
When the message is being decoded,
the parser needs to be able to skip
fields that it doesn't recognize.

Encoding
Key = Tag Number + Wire Type

Encoding
Key = (field_number << 3) | wire_type
message Test1
{
required int32 a = 1;
}
Key = 00001000
If we use one of the signed types, the resulting
varint uses ZigZag encoding, which is much
more efficient.

Encoding
message Test2
{
required string b = 2;
}
Setting the value of b to "testing" gives
you:
12 07 74 65 73 74 69 6e 67

Encoding
If your message definition has
repeated elements (without the
[packed=true] option), the encoded
message has zero or more key-value
pairs with the same tag number.

Now We Know Why


Don't change the numeric tags for
any existing fields.
Any new fields that you add should
be optional or repeated
Non-required fields can be removed,
as long as the tag number is not
used again in your updated message
type.

Now We Know Why


int32, uint32, int64, uint64, and bool
are all compatible
sint32 and sint64 are compatible with
each other but are not compatible
with the other integer types
Use Tag Numbers 1 through 15

Now We Know Why


Why Reserved Tags Exist

one of
message SampleMessage {
oneof test_oneof {
string name = 4;
SubMessage sub_message = 9;
}
}
Saves Memory
Setting any member of the oneof automatically
clears all the other members.
Cannot use the required, optional, or repeated
keywords

Extensions
message Foo
{
// ... extensions 100 to 199;
}
Usage :
extend Foo
{
optional int32 bar = 126;
}

S-ar putea să vă placă și