Sunteți pe pagina 1din 12

M.E.

ABitTorrentclientinPython3.5
Python3.5comeswithsupportforasynchronousIO,whichseemslikeaperfectfitwhen
implementingaBitTorrentclient.ThisarticlewillguideyouthroughtheBitTorrentprotocol
detailswhileshowcasinghowasmallclientwasimplementedusingit.

PostedinCodewithtagsPython,BitTorrent,atWednesday,August24,2016

WhenPython3.5wasreleasedtogetherwiththenewmoduleasyncioIwascuriostogiveitatry.RecentlyIdecidedto
implementasimpleBitTorrentclientusingasyncioIhavealwaysbeeninterestedinpeertopeerprotocolsandit
seemedlikeaperfectfit.

TheprojectisnamedPieces,allofthesourcecodeisavailableatGitHubandreleasedundertheApache2license.
Feelfreetolearnfromit,stealfromit,improveit,laughatitorjustignoreit.

IpreviouslypostedashortintroductiontoPythonsasyncmodule.Ifthisisyourfirsttimelookingat asyncio
itmightbeagoodideatoreadthroughthatonefirst.

AnintroductiontoBitTorrent
BitTorrent has been around since 2001 when Bram Cohen authored the first version of the protocol. The big
breakthroughwaswhensitesasThePirateBaymadeitpopulartousefordownloadingpiratedmaterial.Streaming
sites, such as Netflix, might have resulted in a decrease of people using BitTorrent for downloading movies. But
BitTorrentisstillusedinanumberofdifferent,legal,solutionswheredistributionoflargerfilesareimportant.

Facebookuseittodistributeupdateswithintheirhugedatacenters
AmazonS3implementitfordownloadingofstaticfiles
TraditionaldownloadsstillusedforlargerfilessuchasLinuxdistributions

BitTorrent is a peertopeer protocol, where peersjoinaswarm of other peers to exchange pieces of data between
eachother.Eachpeerisconnectedtomultiplepeersatthesametime,andthusdownloadingoruploadingtomultiple
peersatthesametime.Thisisgreatintermsoflimitingbandwidthcomparedtowhenafileisdownloadedfroma
centralserver.Itisalsogreatforkeepingafileavailableasitdoesnotrelyonasinglesourcebeingonline.

Thereisa .torrent filethatregulateshowmanypiecesthereisforagivenfile(s),howtheseshouldbeexchanged


betweenpeers,aswellashowthedataintegrityofthesepiecescanbeconfirmedbyclients.

While going through the implementation it might be good to have read, or to have another tab open with the
Unofficial BitTorrent Specification. This is without a doubt the best source of information on the BitTorrent
protocol.Theofficialspecificationisvagueandlackscertaindetailssotheunofficialistheoneyouwanttostudy.

Parsinga.torrentfile
Thefirstthingaclientneedstodoistofindoutwhatitissupposedtodownloadandfromwhere.Thisinformationis
whatisstoredinthe .torrent file,a.k.a.themetainfo.Thereisanumberofpropertiesstoredinthemetainfothat
weneedinordertosuccessfullyimplementaclient.

Thingslike:

Thenameofthefiletodownload
Thesizeofthefiletodownload
TheURLtothetrackertoconnectto

AllthesepropertiesarestoredinabinaryformatcalledBencoding.

Bencoding supports four different data types, dictionaries,lists,integers and strings it is fairly easy translate to
PythonsobjectliteralsorJSON.

BelowisbencodingdescribedinAugmentedBackusNaurFormcourtesyoftheHaskelllibrary.

<BE> ::= <DICT> | <LIST> | <INT> | <STR>

<DICT> ::= "d" 1 * (<STR> <BE>) "e"


<LIST> ::= "l" 1 * <BE> "e"
<INT> ::= "i" <SNUM> "e"
<STR> ::= <NUM> ":" n * <CHAR>; where n equals the <NUM>

<SNUM> ::= "-" <NUM> / <NUM>


<NUM> ::= 1 * <DIGIT>
<CHAR> ::= %
<DIGIT> ::= "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"

Inpieces the encoding and decoding of bencoded data is implemented in the pieces.bencoding module (source
code).

HereareafewexamplesdecodingbencodeddataintoaPythonrepresentationusingthatmodule.

>>> from pieces.bencoding import Decoder

# An integer value starts with an 'i' followed by a series of


# digits until terminated with a 'e'.
>>> Decoder(b'i123e').decode()
123

# A string value, starts by defining the number of characters


# contained in the string, followed by the actual string.
# Notice that the string returned is a binary string, not unicode.
>>> Decoder(b'12:Middle Earth').decode()
b'Middle Earth'

# A list starts with a 'l' followed by any number of objects, until


# terminated with an 'e'.
# As in Python, a list may contain any type of object.
>>> Decoder(b'l4:spam4:eggsi123ee').decode()
[b'spam', b'eggs', 123]

# A dict starts with a 'd' and is terminated with a 'e'. objects


# in between those characters must be pairs of string + object.
# The order is significant in a dict, thus OrderedDict (from
# Python 3.1) is used.
>>> Decoder(b'd3:cow3:moo4:spam4:eggse').decode()
OrderedDict([(b'cow', b'moo'), (b'spam', b'eggs')])

Likewise,aPythonobjectstructurecanbeencodedintoabencodedbytestringusingthesamemodule.

>>> from collections import OrderedDict


>>> from pieces.bencoding import Encoder

>>> Encoder(123).encode()
b'i123e'

>>> Encoder('Middle Earth').encode()


b'12:Middle Earth'

>>> Encoder(['spam', 'eggs', 123]).encode()


bytearray(b'l4:spam4:eggsi123ee')

>>> d = OrderedDict()
>>> d['cow'] = 'moo'
>>> d['spam'] = 'eggs'
>>> Encoder(d).encode()
bytearray(b'd3:cow3:moo4:spam4:eggse')

Theseexamplescanalsobefoundintheunittests.

Theparserimplementationisprettystraightforward,noasyncioisusedherethough,notevenreadingthe .torrent
fromdisk.

Usingtheparserfrom pieces.bencoding ,letsopenthe .torrent forthepopularLinuxdistributionUbuntu:

>>> with open('tests/data/ubuntu-16.04-desktop-amd64.iso.torrent', 'rb') as f:


... meta_info = f.read()
... torrent = Decoder(meta_info).decode()
...
>>> torrent
OrderedDict([(b'announce', b'http://torrent.ubuntu.com:6969/announce'), (b'announce-list',
[[b'http://torrent.ubuntu.com:6969/announce'], [b'http://ipv6.torrent.ubuntu.com:6969/announce']
]), (b'comment', b'Ubuntu CD releases.ubuntu.com'), (b'creation date', 1461232732), (b'info',
OrderedDict([(b'length', 1485881344), (b'name', b'ubuntu-16.04-desktop-amd64.iso'), (b'piece
length', 524288), (b'pieces', b'\x1at\xfc\x84\xc8\xfaV\xeb\x12\x1c\xc5\xa4\x1c?
\xf0\x96\x07P\x87\xb8\xb2\xa5G1\xc8L\x18\x81\x9bc\x81\xfc8*\x9d\xf4k\xe6\xdb6\xa3\x0b\x8d\xbe\xe3L\
xfd\xfd4\...')]))])

Here you can read see some of the metadata such as the name of the destination file (ubuntu16.04desktop
amd64.iso)andthetotalsizeinbytes(1485881344).

Noticehowthekeysusedinthe OrderedDict arebinary strings. Bencoding is a binary protocol, and using UTF8
stringsaskeyswillnotwork!

Awrapperclass pieces.torrent.Torrent exposing these properties is implemented abstracting the binary strings,
andotherdetailsawayfromtherestoftheclient.Thisclassonlyimplementstheattributesusedinpiecesclient.

Iwillnotgothroughwhichattributesthatisavailable,insteadtherestofthisarticlewillreferbacktoattributesfound
inthe .torrent /metainfowereused.

Connectingtothetracker

Now that we can decode a .torrent file and we have a Python representation of the data, we need to get a list of
Now that we can decode a .torrent file and we have a Python representation of the data, we need to get a list of
peerstoconnectwith.Thisiswherethetrackercomesin.Atrackerisacentralserverkeepingtrackofavailablepeers
foragiventorrent.AtrackerdoesNOTcontainanyofthetorrentdata,onlywhichpeersthatcanbeconnectedtoand
theirstatistics.

Buildingtherequest
The announce property in the metainfo is the HTTP URL to the tracker to connect to using the following URL
parameters:

Parameter Description
info_hash TheSHA1hashoftheinfodictfoundinthe .torrent
peer_id AuniqueIDgeneratedforthisclient
uploaded Thetotalnumberofbytesuploaded
downloaded Thetotalnumberofbytesdownloaded
left Thenumberofbyteslefttodownloadforthisclient
port TheTCPportthisclientlistenson
compact Whetherornottheclientacceptsacompactedlistofpeersornot

The peer_id needs to be exactly 20 bytes, and there are two major conventions used on how to generate this ID.
PiecesfollowstheAzureusstyleconventiongeneratingpeeridlike:

>>> import random


# -<2 character id><4 digit version number>-<random numbers>
>>> '-PC0001-' + ''.join([str(random.randint(0, 9)) for _ in range(12)])
'-PC0001-478269329936'

Atrackerrequestcanlooklikethisusinghttpie:

http GET "http://torrent.ubuntu.com:6969/announce?


info_hash=%90%28%9F%D3M%FC%1C%F8%F3%16%A2h%AD%D85L%853DX&peer_id=-PC0001-
706887310628&uploaded=0&downloaded=0&left=699400192&port=6889&compact=1"
HTTP/1.0 200 OK
Content-Length: 363
Content-Type: text/plain
Pragma: no-cache


d8:completei3651e10:incompletei385e8:intervali1800e5:peers300:%yOk.
@_<K+
\mb^Tn^ O
A*1*>B)/u
...

TheresponsedataistruncatedsinceitcontainsbinarydatathatscrewsuptheMarkdownformatting.

Fromthetrackerresponse,thereistwopropertiesofinterest:

intervalTheintervalinsecondsuntiltheclientshouldmakeanewannouncecalltothetracker.
peersThelistofpeersisabinarystringwithalengthofmultipleof6bytes.Whereeachpeerconsistofa4byte
IPaddressanda2byteportnumber(sinceweareusingthecompactformat).

So, a successful announce call made to the tracker, gives you a list of peers to connect to. This might not be all
available peers in this swarm, only the peers the tracker assigned your client to connect. A subsequent call to the
trackermightresultinanotherlistofpeers.

AsyncHTTP
PythondoesnotcomewithabuiltinsupportforasyncHTTPandmybelovedrequestslibrarydoesnotimplement
asyncioeither.ScoutingaroundtheInternetitlookslikemostuseaiohttp,whichimplementbothaHTTPclientand
server.

Piecesuse aiohttp inthe pieces.tracker.Tracker classformakingtheHTTPrequesttothetrackerannounceurl.A


shortenedversionofthatcodeisthis:

async def connect(self,


first: bool=None,
uploaded: int=0,
downloaded: int=0):
params = { ...}
url = self.torrent.announce + '?' + urlencode(params)

async with self.http_client.get(url) as response:


if not response.status == 200:
raise ConnectionError('Unable to connect to tracker')
data = await response.read()
return TrackerResponse(bencoding.Decoder(data).decode())

The method is declared using async and uses the new asynchronous context manager async with to allow
beingsuspendedwhiletheHTTPcallisbeingmade.Givenasuccessfulresponse,thismethodwillbesuspendedagain
while reading the binary response data await response.read() . Finally the response data is wrapped in a
TrackerResponse instancecontainingthelistofpeers,alternativeanerrormessage.

Theresultofusing aiohttp isthatoureventloopisfreetoscheduleotherworkwhilewehaveanoutstandingrequest


tothetracker.

Seethemodule pieces.tracker sourcecodeforfulldetails.

Theloop
Everything up to this point could really have been made synchronously, but now that we are about to connect to
multiplepeersweneedtogoasynchronous.

Themainfunctionin pieces.cli isresponsibleforsettinguptheasyncioeventloop.Ifwegetridofsome argparse


anderrorhandlingdetailsitwouldlooksomethinglikethis(seecli.pyforthefulldetails).

import asyncio

from pieces.torrent import Torrent


from pieces.client import TorrentClient

loop = asyncio.get_event_loop()
client = TorrentClient(Torrent(args.torrent))
task = loop.create_task(client.start())

try:
loop.run_until_complete(task)
except CancelledError:
logging.warning('Event loop was canceled')

Westartoffbygettingthedefaulteventloopforthisthread.Thenweconstructthe TorrentClient withthegiven


Torrent (metainfo).Thiswillparsethe .torrent fileandvalidateeverythingisok.

Callingthe async method client.start() andwrappingthatina asyncio.Future andlateraddingthatfutureand


instructingtheeventlooptokeeprunninguntilthattaskiscomplete.

Isthatit?No,notreallywehaveourownloop(noteventloop)implementedinthe pieces.client.TorrentClient
Isthatit?No,notreallywehaveourownloop(noteventloop)implementedinthe pieces.client.TorrentClient
thatsetsupthepeerconnections,schedulestheannouncecall,etc.

TorrentClient issomethinglikeaworkcoordinator,itstartsbycreatingaasync.Queuewhichwillholdthelistof
availablepeersthatcanbeconnectedto.

Then it constructs N number of pieces.protocol.PeerConnection which will consume peers from off the queue.
These PeerConnection instances will wait ( await ) until there is a peer available in the Queue for one of them to
connectto(notblocking).

Sincethequeueisemptytobeginwith,no PeerConnection willdoanyrealworkuntilwepopulateitwithpeersitcan


connectto.Thisisdoneinaloopinsideof TorrentClient.start .

Letshavealookatthisloop:

async def start(self):


self.peers = [PeerConnection(self.available_peers,
self.tracker.torrent.info_hash,
self.tracker.peer_id,
self.piece_manager,
self._on_block_retrieved)
for _ in range(MAX_PEER_CONNECTIONS)]

# The time we last made an announce call (timestamp)


previous = None
# Default interval between announce calls (in seconds)
interval = 30*60

while True:
if self.piece_manager.complete:
break
if self.abort:
break

current = time.time()
if (not previous) or (previous + interval < current):
response = await self.tracker.connect(
first=previous if previous else False,
uploaded=self.piece_manager.bytes_uploaded,
downloaded=self.piece_manager.bytes_downloaded)

if response:
previous = current
interval = response.interval
self._empty_queue()
for peer in response.peers:
self.available_peers.put_nowait(peer)
else:
await asyncio.sleep(5)
self.stop()

Basically,whatthatloopdoesisto:

1. Checkifwehavedownloadedallpieces
2. Checkifuseraborteddownload
3. Makeaannoucecalltothetrackerifneeded
4. Addanyretrievedpeerstoaqueueofavailablepeers
5. Sleep5seconds
So, each time an announce call is made to the tracker, the list of peers to connect to is reset, and if no peers are
retrieved,no PeerConnection willrun.Thisgoesonuntilthedownloadiscompleteoraborted.

Thepeerprotocol
After receiving a peer IP and portnumber from the tracker, our client will to open a TCP connection to that peer.
Oncetheconnectionisopen,thesepeerswillstarttoexchangemessagesusingthepeerprotocol.

First,letsgothroughthedifferentpartsofthepeerprotocol,andthengothroughhowitisallimplemented.

Handshake
The first message sent needs to be a Handshake message, and it is the connecting client that is responsible for
initiatingthis.

ImmediatelyaftersendingtheHandshake,ourclientshouldreceiveaHandshakemessagesentfromtheremotepeer.

The Handshake messagecontainstwofieldsofimportance:

peer_idTheuniqueIDofeitherpeer
info_hashTheSHA1hashvaluefortheinfodict

Ifthe info_hash doesnotmatchthetorrentweareabouttodownload,weclosetheconnection.

ImmediatelyaftertheHandshake,theremotepeermaysenda BitField message.The BitField messageservesto


inform the client on which pieces the remote peer have. Pieces support receiving a BitField message, and most
BitTorrentclientsseemstosenditbutsincepiecescurrentlydoesnotsupportseeding,itisneversent,onlyreceived.

The BitField messagepayloadcontainsasequenceofbytesthatwhenreadbinaryeachbitwillrepresentonepiece.


Ifthebitis 1 thatmeansthatthepeerhavethepiecewiththatindex,while 0 meansthatthepeerlacksthatpiece.
I.e.Eachbyteinthepayloadrepresentupto8pieceswithanysparebitssetto 0 .

Eachclientstartsinthestatechokedandnotinterested.Thatmeansthattheclientisnotallowedtorequestpieces
fromtheremotepeer,nordowehaveintentofbeinginterested.

ChokedAchokedpeerisnotallowedtorequestanypiecesfromtheotherpeer.
UnchokedAunchokedpeerisallowedtorequestpiecesfromtheotherpeer.
InterestedIndicatesthatapeerisinterestedinrequestingpieces.
NotinterestedIndicatesthatthepeerisnotinterestedinrequestingpieces.

Consider Choked and Unchoked to be rules and Interested and Not Interested to be intents between two
peers.

Afterthehandshakewesendan Interested messagetotheremotepeer,tellingthatwewouldliketogetunchokedin


ordertostartrequestingpieces.

Untiltheclientreceivesan Unchoke messageitmaynotrequestapiecefromitsremotepeerthe PeerConnection


willbechoked(passive)untileitherunchokedordisconnected.

Thefollowingsequenceofmessagesiswhatweareaimingforwhensettingupa PeerConnection :

Handshake
client --------------> peer We are initiating the handshake

Handshake
client <-------------- peer Comparing the info_hash with our hash

BitField
client <-------------- peer Might be receiving the BitField
Interested
client --------------> peer Let peer know we want to download

Unchoke
client <-------------- peer Peer allows us to start requesting pieces

Requestingpieces
As soon as the client gets into a unchoked state it will start requesting pieces from the connected peer. The details
surroundingwhichpiecetorequestisdetailedlater,inManagingthepieces.

Ifweknowthattheotherpeerhaveagivenpiece,wecansenda Request messageaskingtheremotepeertosendus


dataforthespecifiedpiece.Ifthepeercompliesitwillsendusacorresponding Piece messagewherethemessage
payloadistherawdata.

Thisclientwillonlyeverhaveoneoutstanding Request perpeerandpolitelywaitfora Piece messageuntiltaking


the next action. Since connections to multiple peers are open concurrently, the client will have multiple Requests
outstandingbutonlyoneperconnection.

If,forsomereason,theclientdonotwantapieceanymore,itcansenda Cancel messagetotheremotepeertocancel


anypreviouslysent Request .

Othermessages

Have

Theremotepeercanatanypointintimesendusa Have message.Thisisdonewhentheremotepeerhavereceiveda


pieceandmakesthatpieceavailableforitsconnectedpeerstodownload.

The Have messagepayloadisthepieceindex.

Whenpiecesreceivea Have messageitupdatestheinformationonwhichpiecesthepeerhas.

KeepAlive

The KeepAlive messagecanbesentatanytimeineitherdirection.Themessagedoesnotholdanypayload.

Implementation
The PeerConnection opensa TCPconnectiontoaremotepeerusing asyncio.open_connection toasynchronously
open a TCP connection that returns a tuple of StreamReader and a StreamWriter . Given that the connection was
createdsuccessfully,the PeerConnection willsendandreceivea Handshake message.

Onceahandshakeismade,thePeerConnectionwilluseanasynchronousiteratortoreturnastreamof PeerMessages
andtaketheappropriateaction.

Usinganasynciteratorseparatesthe PeerConnection fromthedetailsonhowtoreadfromsocketsandhowtoparse


the BitTorrent binary protocol. The PeerConnection can focus on the semantics regarding the protocol such as
managingthepeerstate,receivingthepieces,closingtheconnection.

Thisallowsthemaincodein PeerConnection.start tobasicallylooklike:

async for message in PeerStreamIterator(self.reader, buffer):


if type(message) is BitField:
self.piece_manager.add_peer(self.remote_id, message.bitfield)
elif type(message) is Interested:
self.peer_state.append('interested')
elif type(message) is NotInterested:
if 'interested' in self.peer_state:
self.peer_state.remove('interested')
elif type(message) is Choke:
...

Anasynchronousiteratorisaclassthatimplementsthemethods __aiter__ and __anext__ whichisjustasync


versionsofPythonsstandarditeratorsthathaveimplementsthemethods, __iter__ and next .

Upon iterating (calling next) the PeerStreamIterator will read data from the StreamReader and if enough data is
availabletrytoparseandreturnavalid PeerMessage .

TheBitTorrentprotocolusesmessageswithvariablelength,whereallmessagestakestheform:

<length><id><payload>

lengthisa4byteintegervalue
idisasingledecimalbyte
payloadisvariableandmessagedependent

Soassoonasthebufferhaveenoughdataforthenextmessageitwillbeparsedandreturnedfromtheiterator.

Allmessages aredecodedusingPythons module struct which contains functions to convert to and from Pythons
valuesandCstructs.Structusecompactstringsasdescriptorsonwhattoconvert,e.g. >Ib readsasBigEndian,4
byteunsignedinteger,1bytecharacter.

NotethatallmessagesusesBigEndianinBitTorrent.

This makes it easy to create unit tests to encode and decode messages. Lets have a look on the tests for the Have
message:

class HaveMessageTests(unittest.TestCase):
def test_can_construct_have(self):
have = Have(33)
self.assertEqual(
have.encode(),
b"\x00\x00\x00\x05\x04\x00\x00\x00!")

def test_can_parse_have(self):
have = Have.decode(b"\x00\x00\x00\x05\x04\x00\x00\x00!")
self.assertEqual(33, have.index)

From the raw binary string we can tell that the Have message have a length of 5 bytes \x00\x00\x00\x05 anidof
value4 \x04 andthepayloadis33 \x00\x00\x00! .

Since the message length is 5 and ID only use a single byte we know that we have four bytes to interpret as the
payloadvalue.Using struct.unpack wecaneasilyconvertittoapythonintegerlike:

>>> import struct


>>> struct.unpack('>I', b'\x00\x00\x00!')
(33,)

Thatisbasicallyitregardingtheprotocol,allmessagesfollowthesameprocedureandtheiteratorkeepsreadingfrom
thesocketuntilitgetsdisconnected.Seethesourcecodefordetailsonallmessages.

Managingthepieces
Sofarwehaveonlydiscussedpiecespiecesofdatabeingexchangedbytwopeers.Itturnsoutthatpiecesisnotthe
entiretruth,thereisonemoreconceptblocks.Ifyouhavelookedthroughanyofthesourcecodeyoumighthave
seencodereferingtoblocks,soletsgothroughwhatapiecereallyis.

Apieceis,unsurprisingly,apartialpieceofthetorrentsdata.AtorrentsdataissplitintoNnumberofpiecesofequal
size(exceptthelastpieceinatorrent,whichmightbeofsmallersizethantheothers).Thepiecelengthisspecifiedin
the .torrent file.Typicallypiecesareofsizes512kBorless,andshouldbeapowerof2.

Piecesarestilltoobigtobesharedefficientlybetweenpeers,sopiecesarefurtherdividedintosomethingreferredto
asblocks.Blocksisthechunksofdatathatisactuallyrequestedbetweenpeers,butpiecesarestillusedtoindicate
which peer that have which pieces. If only blocks should have been used it would increase the overhead in the
protocolgreatly(resultinginlongerBitFields,moreHavemessageandlarger .torrent files).

Ablockis2^14(16384)bytesinsize,exceptthefinalblockthatmostlikelywillbeofasmallersize.

Consideranexamplewherea .torrent describesasinglefile foo.txt tobedownloaded.

name: foo.txt
length: 135168
piece length: 49152

Thatsmalltorrentwouldresultin3pieces:

piece 0: 49 152 bytes


piece 1: 49 152 bytes
piece 2: 36 864 bytes (135168 - 49152 - 49152)
= 135 168

Noweachpieceisdividedintoblocksinsizesof 2^14 bytes:

piece 0:
block 0: 16 384 bytes (2^14)
block 1: 16 384 bytes
block 2: 16 384 bytes
= 49 152 bytes

piece 1:
block 0: 16 384 bytes
block 1: 16 384 bytes
block 2: 16 384 bytes
= 49 152 bytes

piece 2:
block 0: 16 384 bytes
block 1: 16 384 bytes
block 2: 4 096 bytes
= 36 864 bytes

total: 49 152 bytes


+ 49 152 bytes
+ 36 864 bytes
= 135 168 bytes

ExchangingtheseblocksbetweenpeersisbasicallywhatBitTorrentisabout.Onceallblocksforapieceisdone,that
piece is complete and can be shared with other peers (the Have message is sent to connected peers). And once all
piecesarecompletethepeertransformfromadownloadertoonlybeaseeder.

Twonotesonwheretheofficialspecificationisabitoff:
1. Theofficialspecificationrefertobothpiecesandblocksasjustpieceswhichisquiteconfusing.Theunofficial
specificationandothersseemtohaveagreeduponusingthetermblockforthesmallerpiecewhichiswhatwe
willuseaswell.

2. Theofficialspecificationisstatinganotherblocksizethatwhatweuse.Readingtheunofficialspecification,it
seemsthat2^14bytesiswhatisagreedamongimplementersregardlessoftheofficialspecification.

Theimplementation
Whena TorrentClient isconstructed,soisa PieceManager withtheresposibilityto:

Determinewhichblocktorequestnext
Persistingreceivedblockstofile
Determinewhenadownloadiscomplete.

Whena PeerConnection successfullyhandshakeswithanotherpeerandreceivesa BitField messageitwillinform


the PieceManager whichpeer( peer_id )thathavewhichpieces.Thisinformationwillbeupdatedonanyreceived
Have messageaswell.Usingthisinformation,the PeerManager knowsthecollectivestateonwhichpiecesthatare
availablefromwhichpeers.

Whenthefirst PeerConnection goesintoaunchokedstateitwillrequestthenextblockfromitspeer.Thenextblock


isdeterminedbycallingthemethod PieceManager.next_request .

The next_request implementsaverysimplestrategyonwhichpiecetorequestnext.

1. Whenthe PieceManager isconstructedallpiecesandblocksarepreconstructedbasedonthepiecelengthfrom


the .torrent metainfo
2. Allpiecesareputinamissinglist
3. When next_request iscalled,themanagerwilldooneof:
Rerequestanypreviouslyrequestedblockthathastimedout
Requstthenextblockinanongoingpiece
Requestthefirstblockinthenextmissingpiece

Thiswaytheblocksandpieceswillberequstedinorder.However,multiplepiecesmightbeongoingbasedonwhich
pieceaclienthave.

Sincepiecesaimstobeasimpleclient,noefforthavebeenmadeonimplementingasmartorefficientstrategyfor
which pieces to request. A better solution would be to request the rarest piece first, which would make the entire
swarmhealthieraswell.

Wheneverablockisreceivedfromapeer,itisstored(inmemory)bythePieceManager.Whenallblocksforapieceis
retrieved,aSHA1hashismadeonthepiece.ThishashiscomparedtotheSHA1hashesincludeinthe .torrent info
dictifitmatchesthepieceiswrittentodisk.

When all pieces are accounted for (matching hashes) the torrent is considered to be complete, which stops the
TorrentClient closinganyopenTCPconnectionandasaresulttheprogramexitswithamessagethatthetorrentis
downloaded.

Futurework
Seedingisnotyetimplemented,butitshouldnotbethathardtoimplement.Whatisneededissomethingalongthe
linesofthis:

Wheneverapeerisconnectedto,weshouldsenda BitField messagetotheremotepeerindicatingwhichpieces


wehave.

Wheneveranewpieceisreceived(andcorrectnessofhashisconfirmed),each PeerConnection shouldsenda


Have messagetoitsremotepeertoindicatethenewpiecethatcanbeshared.
Have messagetoitsremotepeertoindicatethenewpiecethatcanbeshared.

Inordertodothisthe PieceManager needstobeextendedtoreturnalistof0and1forthepieceswehave.Andthe


TorrentClient to tell the PeerConnection to send a Have to its remote peer. Both BitField and Have messages
shouldsupportencodingofthesemessages.

HavingseedingimplementedwouldmakePiecesagoodcitizen,supportingbothdownloadinganduploadingofdata
withintheswarm.

Additionalfeaturesthatprobablycanbeaddedwithouttoomucheffortis:

Multifiletorrent,willhit PieceManager ,sincePiecesandBlocksmightspanovermultiplefiles,itaffectshow


filesarepersisted(i.e.asingleblockmightcontaindataformorethanonefile).

Resume a download, by seeing what parts of the file(s) are already downloaded (verified by making SHA1
hashes).

Summary
It was real fun to implement a BitTorrent client, having to handle binary protocols and networking was great to
balanceallthatrecentwebdevelopmentIhavebeendoing.

Python continues to be one of my favourite programming language. Handling binary data was a breeze given the
struct moduleandtherecentaddition asyncio feelsverypythonic.Usingasynciteratortoimplementtheprotocol
turnedouttobeagoodfitaswell.

HopefullythisarticleinspiredyoutowriteaBitTorrentclientofyourown,ortoextendpiecesinsomeway.Ifyou
spotanyerrorinthearticleorthesourcecode,feelfreetoopenanissueoveratGitHub.

Comments

commentspoweredbyDisqus

MarkusEliasson.
Athoroughtechnicalleadwithapassionforproducingvaluableandcleancode.Tendstooccasionallyblogaboutbuildingsoftwareandcan'tseemtomakeuphis
mindonwhichprogramminglanguagetousenext.

markus.eliasson@gmail.com

GitHub Twitter LinkedIn

S-ar putea să vă placă și