Sunteți pe pagina 1din 2

Injection Methods

Gigablast allows you to inject documents directly into the index by using the command gb [-c <hosts.conf>] <hostId> --inject
<file> where <file> must be a sequence of HTTP requests as described below. They will be sent to the host with id <hostId>.
ou can also inject your own content a second way! by using the Inject "#$ %age.
Thirdly you can use your own %rogram to feed the content directly to Gigablast using the same form %arameters as the form on
the Inject "#$ %age.
Input Parameters
&hen sending an injection HTTP request to a Gigablast ser'er! you may o%tionally su%%ly an HTTP (I() in addition to the
content. This (I() is treated as if Gigablast*s s%ider downloaded the %age you are injecting and recei'ed that (I(). If you do
su%%ly this (I() you must ma+e sure it is HTTP com%liant! %receeds the actual content and ends with a , , followed by the
content itself. The smallest mime header you can get away with is ,HTTP -.. , which is just an ,/0, re%ly from an HTTP
ser'er.
The cgi %arameters acce%ted by the 1inject "#$ for injecting content are the following2 3remember to map spaces to +'s etc.4
u56 6 is the url you are injecting. This is required.
c56 6 is the name of the collection into which you are injecting the content. This is required.
delete56 6 is . to add the "#$1content and 7 to delete the "#$1content from the index. 8efault is ..
i%56
6 is the i% of the "#$ 3i.e. 7.-.9.:4. If this is ommitted or in'alid then Gigablast will loo+u% the IP!
%ro'ided iplookups is true. ;ut if iplookups is false! Gigablast will use the default IP of 7.-.9.:.
i%loo+u%s56
If 6 is 7 and the i% of the "#$ is not 'alid or %ro'ided then Gigablast will loo+ it u%. If 6 is . Gigablast will ne'er
loo+ u% the IP of the "#$. 8efault is 7.
dedu%56
If 6 is 7 then Gigablast will not add the "#$ if another already exists in the index from the same domain with the
same content. If 6 is . then Gigablast will not do any dedu%ing. 8efault is 7.
rs56
6 is the number of the ruleset to use to index the "#$ and its content. It will be auto<determined if rs is omitted
or rs is <7.
quic+56
If 6 is 7 then the re%ly returned after the content is injected is the re%ly described directly below this table. If 6 is .
then the re%ly will be the HT($ form interface.
hasmime56 6 is 7 if the %ro'ided content includes a 'alid HTTP (I() header! . otherwise. 8efault is ..
content56
6 is the content for the %ro'ided "#$. If hasmime is true then the first %art of the content is really an HTTP mime
header! followed by , ,! and then the actual content.
ucontent56
6 is the "=encoded content for the %ro'ided "#$. "se this one instead of the content cgi %arameter if you do not
want to encode the content. This brea+s the HTTP %rotocol standard! but is con'enient because the caller does
not ha'e to con'ert s%ecial characters in the document to their corres%onding HTTP code
sequences. IMP!"#$%#2 this cgi %arameter must be the last one in the list.
&ample Injection "e'uest 3line brea+s are >r>n42
POST /inject HTTP/1.0
Content-Length: 291
Content-Type: text/html
Connection: Close
!my"l#c!#$elete!0#ip!%.&.'.(#iploo)ps!0#$e$p!1#"s!(#*ic)!1#h+smime!1#content!HTTP 200
L+st-,o$i-ie$: Sn. 0' /o0 199% 01:%9:2( 3,T
Connection: Close
Content-Type: text/html
ucontent is the unencoded content of the %age we are injecting. It allows you to s%ecifiy data without ha'ing to url encode it for
%erformance and ease.
#he "epl(
The re%ly is always a ty%ical HTTP re%ly! but if you defined quick=1 then the ?content? 3the stuff below the returned (I()4 of the
HTTP re%ly to the injection request is of the format2
<6> docId5<> hostId5<@>
/#
<6> <error message>
&here <6> is a string of digits in ABCII! corres%onding to the error code. 6 is . on success 3no error4 in which case it will be
followed by a long long docId and a hostId! which corres%onds to the host in the hosts.conffile that stored the document. Any
twins in its grou% 3shard4 will also ha'e co%ies. If there was an error then 6 will be greater than . and may be followed by a
s%ace then the error message itself. If you did not definequick=1! then you will get bac+ a res%onse meant to be 'iewed on a
browser.
(a+e sure to read the com%lete re%ly before s%awning another request! lest Gigablast become flooded with requests.
)xam%le success re%ly2 ) docId*+,-./- hostId*)
)xam%le error re%ly2 +, 0annot allocate memor(
Bee the )rror Codes for all errors! but the following errors are most li+ely2
+, 0annot allocate memor( There was a shortage of memory to %ro%erly %rocess the request.
-,11+ "ecord not found A cached %age was not found when it should ha'e been! li+ely due to corru%t data on dis+.
-,123 #r( doing it again There was a shortage of resources so the request should be re%eated.
-,42- %o collection record The injection was to a collection that does not exist.

S-ar putea să vă placă și