Sunteți pe pagina 1din 19

A introduction to

A introduction to
Maurizio Napolitano <napo@fbk.eu>
Summer School
"Data journalism e visualizzazione
grafica dei dati"
29 July 2011 Flavon (TN)
for not deveIopers

Description in the
Description in the
name
name
source
http://www.modot.org/central/major_projects/July2006photos.htm
source
http://www.commoncraft.com/video/wikis
SCRAPER
WIKI

Wiki like Wikipedia


Wiki like Wikipedia
Scraper like ???
Scraper like ???
a scraper extract data
from a content

Legal aspect
Legal aspect
Scraper sites may violate
copyright law.
Even taking content from an open content site can be a
copyright violation, if done in a way which does not respect
the license.
For instance, the GNU Free Documentation License (GFDL)
and Creative Commons ShareAlike (CC-BY-SA) licenses
require that a republisher inform readers of the license
conditions, and give credit to the original author.
http://en.wikipedia.org/wiki/Scraper_site

.. then scraperwiki is ...


.. then scraperwiki is ...
https://scraperwiki.com/
A pIace where share scrapers . and data :)

ScraperWiki legal
ScraperWiki legal
aspect
aspect
https://scraperwiki.com/terms_and_conditions/
Use
6. You agree that, in using the ScraperWiki site and services, you will
not interfere with the legal rights
[...]
InteIIectuaI Property
9. Subject to the following paragraphs, the source code of the
ScraperWiki site, and all other copyrightable materials that form a part
of it is released under the GNU Affero General Public License.
10. All scraping code hosted on the site is licensed under the GNU
General Public License. You hereby license all scraping code you
create using ScraperWiki under the same licence.
11. You agree to assert no additional intellectual property rights,
including copyright and database right, in any scraped data other than
those which subsisted in the relevant web sites before the running of
the relevant scraper and which were held by you at that time.
12. You grant us a non-exclusive, worldwide, licence to use any data
that you store on our site, for the purposes of administering the site.

ScraperWiki legal
ScraperWiki legal
aspect
aspect
USE
6. You agree [..] you wiII not interfere with
the IegaI rights
[...]
INTELLECTUAL PROPERTY
9. [.] the source code of the ScraperWiki [..] is released
under the GNU Affero GeneraI PubIic License.
10. AII scraping code [.] is licensed under the GNU
GeneraI PubIic License.
11. You agree to assert no additionaI
inteIIectuaI property rights [...]
12. You grant us a non-exclusive, worldwide, licence to use any data
that you store on our site, for the purposes of administering the site.

HOW CREATE A
HOW CREATE A
SCRAPER?
SCRAPER?

The NOT developers


The NOT developers

The technical
The technical
approach
approach
http://unstats.un.org/unsd/demographic/products/socind/education.htm

Behind the page


Behind the page
HTML
code

Where are the data?


Where are the data?
There is a structure
behind!!!

The algorithm!!!
The algorithm!!!
Download th web page
Read the information
Find the right position
Extract the data
Create a CSV fiIe
data1;data2;data3
[...]
dataN1;dataN2;dataN3

Example: python code


Example: python code
https://scraperwiki.com/docs/python/python_intro_tutorial/

.
.
and everything run
and everything run
in the cloud!!!
in the cloud!!!

The code in the cloud


The code in the cloud
https://scraperwiki.com/scrapers/mlb_rosters/

Sharing & ReUse


Sharing & ReUse

Enjoy!!!
Enjoy!!!
httpS://scraperwiki.com/

Thanks!
Thanks!
A introduction to ScraperWiki for NOT developers by
Maurizio Napolitano <napo@fbk.eu>
is licensed under a
Creative Commons Attribuzione 3.0 Unported License.
Summer School
"Data journalism e visualizzazione
grafica dei dati"
29 July 2011 Flavon (TN)
Created for

S-ar putea să vă placă și