Documente Academic
Documente Profesional
Documente Cultură
Big Data
Data Science
Seeking Question
alias http://ckan.net
11. http://quandl.com
12. Social Network Analysis Interactive Dataset Library
OpenDataSoft catalog
28. http://www.archives.gov/research...
29. http://www.bls.gov/
30. http://www.crunchbase.com/
31. http://www.dartmouthatlas.org/
32. http://www.data.gov/
33. http://www.datakc.org
34. http://dbpedia.org
35. http://www.delicious.com/jbaldwi...
36. http://www.faa.gov/data_research/
37. http://www.factual.com/
38. http://research.stlouisfed.org/f...
39. http://www.freebase.com/
40. http://www.google.com/publicdata...
41. http://www.guardian.co.uk/news/d...
42. http://www.infochimps.com
43. http://www.kaggle.com/
44. http://build.kiva.org/
45. http://www.nationalarchives.gov....
46. http://www.nyc.gov/html/datamine...
47. http://www.ordnancesurvey.co.uk/...
48. http://www.philwhln.com/how-to-g...
49. http://www.imdb.com/interfaces
50. http://imat-relpred.yandex.ru/en...
51. http://www.dados.gov.pt/pt/catal...
52. http://knoema.com
53. http://daten.berlin.de/
54. http://www.qunb.com
55. http://databib.org/
56. http://datacite.org/
57. http://data.reegle.info/
58. http://data.wien.gv.at/
59. http://data.gov.bc.ca
60. https://pslcdatashop.web.cmu.edu/
100+ Answers
Bret Taylor, CEO of Quip. Ex-CTO of Facebook, co-founder FriendFeed, cocreator Google Maps.
Written Apr 5, 2011
I did a blog post about open data a long time ago (http://bret.appspot.com/entry/we... ),
and ReadWriteWeb did a nice roundup based on all the comments from the blog post:
http://www.readwriteweb.com/arch... .
Since that post, there have been a lot more comments on the blog (105 and counting), so
you may want to comb the comments for any ones the RWW post missed.
142.6k Views View Upvotes
Related Questions
Where can I nd large datasets open to the public for India specically?
Where can I nd large datasets closed to the public?
What are some free but large datasets of general products?
Have a link to a large free e-mail dataset (not Enron)?
Where can I get public spatial datasets?
A database of open databases? (also see most-upvoted questions on the Open Data
Stack Exchange at Highest Voted Questions )
http://www.reddit.com/r/datasets
https://d396qusza40orc.cloudfron...
Analysis course)
Where is it possible to find raw climate data? (also NCAR - Climate Data Guide )
| Ecological Data Wiki
PhysioNet - largest repository of free, open-access databases and open-source
computational tools devoted to complex signals informatics
Page on sdss.org - SDSS Astronomy datasets. For more on astronomy, see What are
some astronomy datasets open to the public?
http://berkeleyearth.org/dataset...
- for neuroscience
http://www.fda.gov/Food/FoodSafe...
http://www.ams.usda.gov/AMSv1.0/pdp
- EPA data
http://data.giss.nasa.gov/
http://jimwatsonsequence.cshl.edu/
Some others:
http://www.cdc.gov/nchs/nhanes/n...
Examination Survey
http://www.nlsinfo.org/ordering/...
http://road.hmdc.harvard.edu/
[1] The NLSY79 Geocode data can only be made available to users who have successfully
completed a geocode application and signed a confidentiality agreement with the U.S.
Bureau of Labor Statistics. If interested in gaining access to the NLSY79 Geocode data,
please review the information at http://stats.bls.gov/nls/nlsgeo7... .
216.1k Views View Upvotes
I'll try to restrict my answers to datasets greater than 1 GB in size, and order my answers
by the size of the dataset.
Morethan1TB
The 1000Genomes project makes 260 TB of human genome data available [13]
The InternetArchive is making an 80 TB web crawl available for research [17]
The TREC conference made the ClueWeb09 [3] dataset available a few years back.
You'll have to sign an agreement and pay a nontrivial fee (up to $610) to cover the
sneakernet data transfer. The data is about 5 TB compressed.
ClueWeb12 [21] is now available, as are the Freebase annotations, FACC1[22]
CNetS at Indiana University makes a 2.5 TB click dataset available [19]
ICWSMmade a large corpus of blog posts available for their 2011 conference [2].
You'll have to register (an actual form, not an online form), but it's free. It's about 2.1
TB compressed.
The Yahoo News Feed
Google BigQuery is an awesome place to share open datasets: Once data is loaded in
BigQuery, you can make it public - allowing others to instantly analyze it using just SQL.
See a list of some of the amazing datasets shared on BigQuery: http://www.reddit.com/r/b
igquery...
Among those datasets I'd like to highlight GDELT: More than a quarter billion rows
(growing every day) of every event happening around the world. I made a video about it:
(R)
(R)
(R)
(R)
andftp://ftp.cmdl.noaa.gov/
(R)
SocialSciences
General Social Survey: General Social Survey
ICPSR: Page on umich.edu
SNAP: Stanford Large Network Dataset Collection
UCLA Social Sciences Archive: Data Portals
UPJOHN INST: Employment Research Data Center
TimeSeries
Time Series data Library: Time Series Data Library
Universities
Carnegie Mellon University Enron email: Enron Email Dataset
Carnegie Mellon University StatLab: StatLib---Datasets Archive
Carnegie Mellon University JASA data archive: StatLib---JASA Data Archive
Ohio State University Financial data: Financial Data Finder
UC Berkeley: UC DATA :HOME
UCLA: SOCR Data - Socr
UC Riverside Time Series: Welcome to the UCR Time Series Classification/Clustering
Page
University of Toronto: Delve Datasets
44.1k Views View Upvotes
Alex Kamil
Updated Sep 28, 2013
Here are some big corpora we use in NLP in addition to the ones already mentioned:
ukWaC: a 2 billion word corpus constructed from the Web limiting the crawl to the .uk
domain and using medium-frequency words from the BNC as seeds. The corpus was
POS-tagged and lemmatized with the TreeTagger. There's also a parsed version called
pukWac. Get both at: http://wacky.sslmit.unibo.it/dok...
WaCkypedia: a 2009 dump of the English Wikipedia (about 800 million tokens),
including part of speech/lemma information, as well as a full syntactic parse. The texts
were extracted from the dump and cleaned using the Wikipedia extractor. Get it at the
same URL as ukWac: http://wacky.sslmit.unibo.it/dok...
USENET corpus: A collection of public USENET postings. This corpus was collected
between Oct 2005 and Jan 2011, and covers 47860 English language, non-binary-file
news groups. Get it at: http://www.psych.ualberta.ca/~we... [CAVEAT: it's huge!]
The collection of data that comes with the Natural Language Toolkit (NLTK). It's
probably not as large as the others but it's a good set. See descriptions at:
http://nltk.googlecode.com/svn/t...
Europarl: Proceedings of the European Parliament in 13 languages. Cleaned and preprocessed for machine translation research. Get it at: http://www.statmt.org/eur
oparl [FYI, NLTK has a built-in interface to access this corpus.]
The Google Books Ngram corpus: Pretty big. Get it at: http://books.google.com/n
grams/d...
12.3k Views View Upvotes
Yelp provides data and reviews of the 250 closest businesses for 30 universities for
students and academics to explore and research. I had downloaded the Yelp'sAcademic
Dataset in early 2015 and it contained a total of 330,071 reviews provided by 130,873
users to 13,481 businesses.
The dataset is a single gzip-compressed file, composed of one json-object per line. Every
object contains a 'type' field, which tells you whether it is a business, a user, or a review.
Related Questions
Where can I nd large datasets open to the public for
India specically?
2,589 Views
Where can I nd large datasets closed to the public?
3,023 Views
What are some free but large datasets of general
products?
1,057 Views
Have a link to a large free e-mail dataset (not Enron)?
1,136 Views
Where can I get public spatial datasets?
1,132 Views
Where can I nd datasets (open to public) of
eCommerce websites?
5,250 Views
Where can I nd large historic datasets on exemployees or recruitment open to the public?
822 Views
Where can I nd large data sets open to the public of
all available drugs and medicines?
1,220 Views
Where can I nd large datasets open to the public for
merger and acquisition integration performance?
897 Views
What large, open and public datasets are there for
Educational Data Mining?
4,360 Views
For dataset challenge, Yelp provides a larger dataset than the AcademicDataset
mentioned above. At present (when this answer is written), the ChallengeDataset
includes information about local businesses in 10 cities across 4 countries.
The ChallengeDataset contains:
1.6M reviews and 500K tips by 366K users for 61K businesses
481K business attributes, e.g., hours, parking availability, ambience.
Social network of 366K users for a total of 2.9M social edges.
Aggregated check-ins over time for each of the 61K businesses
7.7k Views View Upvotes
Recently I came across CERN's open data initiative. Having talked to a few guys that have
worked there, I'm pretty sure these guys currently gather one of the largest datasets in the
world! Have a look at CERN Open Data Portal
Hope this helps!
-w
13.3k Views View Upvotes
Atakan Cetinsoy, SaaS Product Strategy | Data Science | Lean Startup Advisory |
Go-to-Market Plan
Written Sep 27, 2015
Since we get asked this question by our Machine Learning oriented users very frequently,
my company (BigML) has compiled a list with over 250 sources here:
List of Public Data Sources Fit for Machine Learning
You may also want to check out the related blog post for some more context:
Data, Data, Data: Thousands of Public Data Sources
11.6k Views View Upvotes
Large data sets mostly from finance and economics that could also be applicable in related
fields studding the human condition:
World Bank Data. Lots of years. Lots of Countries Countries | Data . Lots of of data
variables (Topics | Data - Indicators | Data - Catalog ), years and Countries .
Your Window Into U.S. Federal Statistics
FRB: Data Releases
Federal Reserve Economic Data
Our government also likes to stay globally informed and is willing to share some of that
data: CIA -The World Factbook
Human Development Reports
Explorer
PennWorldTables
Data: Real and PPP-adjusted GDP in US millions of dollars, national accounts
(household consumption, investment, government consumption, exports and
imports), exchange rates and population figures.
Geographicalcoverage: Countries around the world
Timespan: from 1950-2011 (version 8.1)
Availableat: Online here
Feenstra,RobertC.,RobertInklaarandMarcelP.Timmer(2015),TheNext
GenerationofthePennWorldTableforthcomingAmericanEconomicReview,
availablefordownloadatwww.ggdc.net/pwt
CorrelatesofWarBilateralTrade
Data: Total national trade and bilateral trade flows between states. Total imports
and exports of each country in current US millions of dollars and bilateral flows in
WorldBankWorldDevelopmentIndicators
Data: Trade (% of GDP) and many more specific series: trade in merchandise,
trade in services, trade in high-technology, trade in ICT goods, trade in ICT services
always exports and imports separately. Also export and import value index and
volume index.
Geographicalcoverage: Countries and world regions
Timespan: Annual since 1960
Availableat: Online at http://data.worldbank.org
UNComtrade
Data: Bilateral trade flows by commodity
Geographicalcoverage: Countries around the world
Timespan: 1962-2013
Availableat: Online here
UNCTADstat
Data: Many different measures, including trade by volumes and value
Geographicalcoverage: Countries around the world
Timespan: For some series, data is available since 1948 mostly annual,
sometimes quarterly.
Availableat: Online here
EurostatCOMEXT
Data: Trade flows (also by commodity)
Geographicalcoverage: Europe (EU and EFTA)
Timespan: Mostly since 1988
Availableat: Online here
Also,theEurostatwebsiteStatisticsExplainedpublishesuptodatestatistical
informationoninternationaltradeingoods andservices .
WorldTradeOrganizationWTO
Data: Many series on tariffs and trade flows
Geographicalcoverage: Countries around the world
Timespan: Since 1948 for some series
Availableat: Online here
CEPIIdatabaseontheWorldEconomy
Data: Many different data sets related to international trade, including trade flows
by commodity geographical variables, and variables to estimate gravity models
Geographicalcoverage: Countries around the world
Timespan: Some series go back to the 1990s.
Availableat: Online here
NBERUnitedNationsTradeData,19622000
Data: Export and import values and volumes by commodity
Smallerhistoricaltradedatasets
Data on UKbilateraltrade for the time 1870-1913 was collected by David S.
Jacks. It is downloadable in excel format here .
For the time 18701913 21,000 bilateral trade observations can be found in
Mitchener and Weidenmier (2008) Trade and empire, available in the Economic
Journal here .
Data on UK,Germany,France,andUS between mid-19th to 20th Century can
be found here .
Data on DevelopingCountryExport in 1840, 1860, 1880 and 1900 by John
Hanson is available here .
Data on tradebetweenEnglandandAfrica during the period 1699-1808 is
available on the Dutch Data Archiving and Networked Services . It was compiled
by Marion Johnson.
Applying these same sources to Education quality in developing countries:
EducationIndex multiple sheets of excel datais available at Human
DevelopmentReports or you can use their tool to explore the data Human
Development Reports also google has access to explore the data Google Public Data
Explorer additional indexes in this HD report that you might be interest in are:
Human Development Index and Adult Literacy Index and Gross enrollment ratio.
The World Bank has Literacy rates Adult literacy rate, population 15+ years, both
sexes (%) in addition to lots of other data: World Bank Data. Lots of years. Lots of
CountriesCountries | Data . Lots of data variables Topics | Data - Indicators |
Data - Catalog | The World Bank .
Our government also likes to stay informed and is willing to share some of that data:
CIA -The World Factbook
Possibly looking at the Human Capital Report 2015 has Rankings of human
capital index has various measures of education and productivity capabilities.
Unveiling the beauty of statistics for a fact based world view. (http://www.gapminder.org/)
Data Plotter
SwedenStatisticaldatabase
WhatistheStatisticaldatabase?
Since January 1997, Statistics Sweden has had databases available on the Internet. The aim
is to provide increased access to statistics and allow users to easily download information
to their own computers.
Statistical database
Contentandsearch
The Statistical database contains a large amount of official statistics that Statistics Sweden
is responsible for. Also included are official statistics from other statistical authorities. The
database contains a number of tables where selected information can be presented on the
screen, in print or transmitted to the user's computer for further processing.
The search process can be made in three ways:
via the link NYA SIFFROR Vlj frn senast uppdaterade tabeller (only in the
Swedish version of the website). Nya siffror shows the latest updated tables in the
Statistical database.
via the subject areas
or via Search the Statistical database.
The Statistical database is available free-of-charge. When making minor retrievals of less
than 10000 table cells, registration is not necessary. For larger retrievals and some future
supplementary services, registration is done by completing theregistrationform .
Largestatisticalfiles(PCAxis)(onlyintheSwedishversionofthewebsite)
The database capacity is limited when it comes to large retrievals. In order to best serve
users of very large retrievals, ready-made statistics files in PC-Axis format have been
created, mainly for regionally distributed material.
PCAxis
PC-Axis is software that handles very large statistical tables. PC-Axis can be used for
processing ready-made statistics files or PC-Axis files from the database. The program can
also pass on the statistics to other programs such as spreadsheets, etc. PC-Axis can be
downloaded free-of-charge from this website.
Services in connection with the Statistical databases
TailormadedatabaseretrievalsonCDROMordiskette
Tailor-made retrievals can be ordered for delivery on diskette or CD-ROM. The price
depends on the production cost.
Microdatabases
Micro databases are available after a harm test of de-identified (anonymised) data is done
at Statistics Sweden. More information on registers is available inDocumentationof
statistics (only in the Swedish version of the website).
Courses
Courses are held regularly (in Swedish) as an aid for those who want to use the Statistical
database. For more information on contents, times and prices of courses, check the
Swedish version of the website Kurser .
Formoreinformation,pleasecontactStatisticsSweden'sInformation
services
Postal address: Box 24300, SE-10451 Stockholm, Sweden
Telefax: +46-8-506 948 99
Telephone: +46-8-506 948 01
WhatistheStatisticaldatabase?
Since January 1997, Statistics Sweden has had databases available on the Internet. The aim
is to provide increased access to statistics and allow users to easily download information
to their own computers.
Statistical database
Contentandsearch
The Statistical database contains a large amount of official statistics that Statistics Sweden
is responsible for. Also included are official statistics from other statistical authorities. The
database contains a number of tables where selected information can be presented on the
screen, in print or transmitted to the user's computer for further processing.
The search process can be made in three ways:
via the link NYA SIFFROR Vlj frn senast uppdaterade tabeller (only in the
Swedish version of the website). Nya siffror shows the latest updated tables in the
Statistical database.
via the subject areas
or via Search the Statistical database.
The Statistical database is available free-of-charge. When making minor retrievals of less
than 10000 table cells, registration is not necessary. For larger retrievals and some future
supplementary services, registration is done by completing theregistrationform .
Largestatisticalfiles(PCAxis)(onlyintheSwedishversionofthewebsite)
The database capacity is limited when it comes to large retrievals. In order to best serve
users of very large retrievals, ready-made statistics files in PC-Axis format have been
created, mainly for regionally distributed material.
PCAxis
PC-Axis is software that handles very large statistical tables. PC-Axis can be used for
processing ready-made statistics files or PC-Axis files from the database. The program can
also pass on the statistics to other programs such as spreadsheets, etc. PC-Axis can be
downloaded free-of-charge from this website.
Services in connection with the Statistical databases
TailormadedatabaseretrievalsonCDROMordiskette
Tailor-made retrievals can be ordered for delivery on diskette or CD-ROM. The price
depends on the production cost.
Microdatabases
Micro databases are available after a harm test of de-identified (anonymised) data is done
at Statistics Sweden. More information on registers is available inDocumentationof
statistics (only in the Swedish version of the website).
Courses
Courses are held regularly (in Swedish) as an aid for those who want to use the Statistical
database. For more information on contents, times and prices of courses, check the
Swedish version of the website Kurser .
Formoreinformation,pleasecontactStatisticsSweden'sInformation
services
Postal address: Box 24300, SE-10451 Stockholm, Sweden
Telefax: +46-8-506 948 99
Telephone: +46-8-506 948 01
9.1k Views View Upvotes
Anton Tarasenko
Updated Dec 5, 2014
CustomGoogleSearch
IOGDS
The following service puts in order more than 1,000,000 public datasets:
IOGDS:InternationalOpenGovernmentDatasetSearch
12.7k Views View Upvotes
Reposting from Alan Morrison's answer to Where on the web can I find free samples of Big
Data to analyze?
This link list, available on Github, is quite long and thorough: caesar0301/awesome-public
-datasets You will see many census data sources listed. Then the challenge becomes how
to get to what you really want and can use.
Note that this list also references a Quora answer that also includes a long list: Where can I
find large datasets open to the public?
For your convenience, I've copied the list of lists as it stood in January 2015 here, but won't
be updating it:
AwesomePublicDatasets
This list of public data sources are collected and tidyed from blogs, answers, and user
reponses. Most of the data sets listed below are free, however, some are not. Other
amazingly awesome lists can be found in theawesome-awesomeness andanother
awesome list.
Agriculture
U.S. Department of Agriculture's PLANTS Database
Biology
1000 Genomes
Collaborative Research in Computational Neuroscience (CRCNS)
Gene Expression Omnibus (GEO)
Human Microbiome Project (HMP)
ICOS PSP Benchmark
MIT Cancer Genomics Data
NIH Microarray data (FTP)
Protein Data Bank
PubChem Project
PubGene (now Coremine Medical)
Stanford Microarray Data
The Personal Genome Project
or PGP
St Louis Federal
Yahoo Finance
GeoSpace/GIS
BODC - marine data of ~22K vars
EOSDIS - NASA's earth observing system data
Factual Global Location Data
Global Administrative Areas Database (GADM)
Geo Spatial Data from ASU
GeoNames Worldwide
Natural Earth - vectors and rasters of the world
Open Street Map (OSM)
TIGER/Line - U.S. boundaries and roads
TwoFishes - Foursquare's coarse geocoder
TZ Timezones shapfiles
Government
Australia (abs.gov.au)
Australia (data.gov.au)
Canada
Chicago
EuroStat
FedStats
Germany
Glasgow, Scotland, UK
Guardian world governments
London Datastore, UK
MassGIS, Massachusetts, U.S.
Netherlands
New Zealand
NYC betanyc
NYC Open Data
OECD
Open Government Data (OGD) Platform India
San Francisco Data sets
South Africa
The World Bank
U.K. Government Data
U.S. American Community Survey
U.S. CDC Public Health datasets
U.S. Census Bureau
U.S. Department of Housing and Urban Development (HUD)
U.S. Federal Government Agencies
U.S. Federal Government Data Catalog
U.S. Food and Drug Administration (FDA)
U.S. Open Government
UK 2011 Census Open Atlas Project
United Nations
Healthcare
EHDP Large Health Data Sets
Gapminder World, demographic databases
Medicare Coverage Database (MCD), U.S.
Medicare Data Engine of medicare.gov Data
Medicare Data File
Image Processing
2GB of Photos of Cats
Face Recognition Benchmark
StatSci.org
The Washington Post List
UCLA SOCR data collection
UFO Reports
Wikileaks 911 pager intercepts
Yahoo Webscope
Search Engines
Academic Torrents of data sharing from UMB
Archive-it from Internet Archive
Datahub.io
DataMarket (Qlik)
Freebase.com of people, places, and things
Harvard Dataverse Network of scientific data
ICPSR (UMICH)
Statista.com - statistics and Studies
Social Sciences
Ancestry.com Forum Dataset over 10 years
CMU Enron Email of 150 users
Facebook Data Scrape (2005)
Facebook Social Networks from LAW (since 2007)
Foursquare Social Network in 2010, 2011
Foursquare from UMN/Sarwat (2013)
General Social Survey (GSS) since 1972
GetGlue - users rating TV shows
GitHub Collaboration Archive
Mobile Social Networks from UMASS
PewResearch Internet Survey Project
SourceForge.net Research Data
StackExchange Data Explorer
Titanic Survival Data Set
Twitter Graph of entire Twitter site
UCB's Archive of Social Science Data (D-Lab)
UCLA Social Sciences Data Archive
UNIMI/LAW Social Network Datasets
Universities Worldwide
UPJOHN for Labor Employment Research
Yahoo! Graph and Social Data
Youtube Video Social Graph in 2007,2008
Sports
Betfair Historical Exchange Data
Cricsheet Matches (baseball)
Ergast Formula 1, from 1950 up to date (API)
Football/Soccer resouces (data and APIs)
Lahman's Baseball Database
Retrosheet Baseball Statistics
Time Series
Time Series Data Library (TSDL) from MU
UC Riverside Time Series Dataset
Transportation
Airlines OD Data 1987-2008
Bike Share Systems (BSS) collection
Hubway Million Rides in MA
Marine Traffic - ship tracks, port calls and more
NYC Taxi Trip Data 2013 (FOIA/FOILed)
Two fully annotated corpora, put together for use by researchers and lexicographers, are:
The BNC (British National Corpus) http://www.natcorp.ox.ac.uk/
and
COCA (Corpus of Contemporary American English)
http://www.americancorpus.org/
The BNC is a little dated now. COCA is excellent, though its user interface is a little clunky
at times.
If you have legitimate, nonprofit research concerns, you may be able to get access to the
granddaddy of them all, the Oxford English Corpus. For commercial use there is a feebased access:
http://oxforddictionaries.com/pa...
4.3k Views View Upvotes
(USA),
(Size:396.7TB)
Size:863.4GB
(Canada)
Education - Data.gov
(Education)
Geo-data
My favorites are:
Awesome Public Datasets
100+ Interesting Data Sets for Statistics
7 Datasets You've Likely Never Seen Before
Another collection of free and open-source datasets
1k Views View Upvotes
The best source of structured data I've seen so far is the UCI Machine Learning Repository:
Data Sets
This question has extensive resources for data sets open to the public, Where can I find
large datasets open to the public?
5.5k Views View Upvotes
Socrata hosts open data websites for a number of governments, government agencies, and
non-profits including:
http://data.seattle.gov
http://data.cityofchicago.org
http://data.medicare.gov
http://data.sunlightlabs.com
http://www.datakc.org
http://gettingpastgo.socrata.com
http://data.govloop.com
There are also over 100K datasets available on our public data portal,
http://opendata.socrata.com
9.6k Views View Upvotes
Krishnan Srinivasarengan, .
Written Jan 21, 2013
For Non-Intrusive Appliance Load monitoring research, data bases are emerging. While
REDD is one instance (already in another answer), there are a few more of them (not as
comprehensive):
BLUED: NILM@CMU
Tracebase: tracebase " Welcome
UMass Smart*: Smart - UMass Trace Repository
6k Views View Upvotes
Ben Hamner
Written Feb 6
Kaggle recently launched Kaggle Datasets . You can download high quality public
datasets here, run analytics on them through Kaggle Scripts, see others analyses, and
discuss them in the forums.
Here's a blog post describing this in more depth: Introducing Kaggle Datasets
2.1k Views View Upvotes
Anonymous
Written Apr 15, 2014
Since I haven't seen it mentioned yet, and work at one of the main sources of its data:
SMOKA , the Subaru-Mitaka-Okayama-Kiso Archive, holds about 15 TB of astronomical
data from facilities run by the National Astronomical Observatory of Japan. All data
becomes publicly available after an embargo period of 12-24 months (to give the original
observers time to publish their papers).
With over a decade of data from some facilities and instruments, it has now become
possible for many researchers to make discoveries just by looking at archived data for
something other than what the original observers had in mind.
Astrophotographer Robert Gendler has also processed images from the SMOKA archive
to create several "NASA Astronomy Picture of the Day" winners.
10.9k Views View Upvotes
If you are looking for mobility data there is the Telecom Italia Bigdata challenge dataset.
You can find it here : Open Data Institute - node Trento
Its about 120 GB of data and there are 7 different typologies of datasets from city life.
Another dataset of mobility data type is the Data 4 Development released by Orange a
french operator. In 2013 they released Call description records about Ivory coast and in
2014 CDR data of Senegal.
Info about the challenge can be found here : http://www.d4d.orange.com/en/home
A new challenge organized by American Society of Statistics can be found here : Support
the Data Challenge at JSM 2015
If you want some more datasets of any kind from pollution data to social network data
then check this post here : Data sets of any type: some links. by Alket Cecaj on Algorithms
and DataFusion
The post is updated regularly as I find new data sets such as the Panama Papers dataset.
3.3k Views View Upvotes
If you are interested in research datasets (large and small), these sites let you search for
them:
http://databib.org/ (a collaborative, annotated bibliography of primary research
data repositories)
http://datacite.org/ (support researchers by helping them to find, identify, and cite
research datasets with confidence)
3.6k Views View Upvotes
Mike Kruger
Written May 16, 2014
IRI has a large (130 gigabyte) set of consumer packaged goods marketing data available.
30 categories, 11 years. For information see Academic Data Set - IRI
3.9k Views View Upvotes
Dataset of 13 billion clicks available for research made available on Jan 20, 2013 here:
Center for Complex Networks and Systems Research
3.6k Views View Upvotes
This online course on applied machine learning provides you released dataset for
Datathon.
Aspiring Minds presents AMDataBootcamp2016, an online + offline bootcamp on
applying machine learning to real world problems. Register and GRAB this unique offering
comprising of a MOOC + a data release + a data competition + a one-day workshop. Last
date of submissions is 8th March 2016. To enroll now and dive deep into ML : Aspiring
Minds University | Boot Camp
R has a built in library called datasets. This has several structured datasets that are useful
for testing and learning. Type library(help=datasets) to get a list. These are available in
your namespace at all times, but they are lazy loaded. To use them, just call them by name,
e.g. str(iris).
2.3k Views View Upvotes
Please see Bernard Marr's Big Data: 33 Brilliant And Free Data Sources For 2016
810 Views View Upvotes
Alex Copulsky
Written Feb 10, 2014
Kevin Edward Kline, data and database expert, I know a 'lil bit about Twitter and
social media
Written Mar 5, 2014
I wrote a blog post about this a while back. For large data sets to tinker with, I recommend
that go to data.gov for large USA data sets orData Search | data.gov.uk for large UK
data sets. In both cases, you'll find a wide variety of data to play with.
Also, don't forget TCP.ORGtheLeadingTcpSiteontheNet .
2.6k Views View Upvotes
Pete Warden
Written Jan 8, 2011 Upvoted by Bradley Voytek, Former Data Scientist, Uber Inc. and Leo
Polovets, Partner at a data-focused seed fund (Susa Ventures). Worked at Factual.
http://codingvc.com/
In the legal world, the Enron dataset is often considered the best public-access dataset. My
recollection is that it was opened to the public by a federal regulatory agency in the course
of its Enron investigaiton. There is a massive industry of "litigation support technology"
and "electronic discovery" firms that develops software to mine and analyze enormous
data sets, and the Enron set is often trotted out in marketing demonstrations of these
software products to demonstrate their effectiveness. Thanks to Shimonee Shah for the
link to it:
Enron Email Dataset
1.2k Views View Upvotes
Gaurav Bhardwaj
Written Sep 21, 2014
Sourabh Daptardar
Written Sep 14, 2013
http://data.gov.in/
Indian government offers about 4K datasets from collected from about 50 departments for
analysis : http://data.gov.in/ and the list is growing. Not every dataset might be 'big
data' from a computer science perspective, but it is, nevertheless, a good source.
12.8k Views View Upvotes
I'd suggest the Ookla's Net Net Index source data (1.5 GB)
"Download the largest publicly available dataset of anonymous broadband speed and
quality test results, with data from every geographic region currently represented in
NetIndex going back to January 2008." Global Broadband
5.3k Views View Upvotes
(.xls)
The Stack Exchange network has its whole data base open for queries and you can even
download the whole dump to yourself. It contains mostly data from Stack Overflow.
Stack Exchange Data Explorer
3.9k Views View Upvotes
The EnronCorpus is a large database of over 600,000 emails generated by 158 employees
of the Enron Corporation. I have used the Enron Email Corpus for training and testing my
email classification algorithm.
https://www.cs.cmu.edu/~enron/
Download link [tgz] https://www.cs.cmu.edu/~enron/en...
4.4k Views View Upvotes
Ian C. Grieve
Written Apr 7, 2011
The City of Toronto publishes a few interesting datasets. Their Dinesafe dataset is
particularly interesting, as it contains information about every restaurant's inspection
(infractions, etc) conducted by Toronto Public Health. You can find all of Toronto's open
datasets at http://toronto.ca/open .
3.6k Views View Upvotes
Abhishek Gupta, have wanderlust, want to experience ambedo again & again, a
kleptomaniac for ...
Updated Feb 11, 2014
1. Academic Torrents
2. Links to free data sets for computer vision applications
3. Amsterdam Library of Object Images
4. The CIFAR-10 and CIFAR-100 are labeled subsets of the 80 million tiny images
dataset.
5. Traffic signs dataset
6. Machine Learning and Data Mining - Datasets
7. Quora Thread
8. DataMob
9. Some More shared on bitly
10. UCD Machine Learning Group
11. Some Links from Open Directory
12. A thread on dataWrangling
13. Kevin's Blog
14. Recommendation and Ratings Public Data Sets
15. Another Quora thread for Kinnect Specific Data
16. /r/datasets
3.2k Views View Upvotes
I just thought I'll add Nation Master to the list, because I use it all the time.
For comparison of all kinds of statistics between countries:
International statistics: Compare countries on just about anything! NationMaster.com
2.2k Views View Upvotes
Drazen Zaric, Grad student interested in machine learning and data science
Written Dec 12, 2010
Stanford Large Network Dataset Collection has some pretty impressive datasets, like
complete Wikipedia edit history (till January 2008) or a collection of 467 million tweets
collected from June to December 2009.
http://snap.stanford.edu/data/in...
3.8k Views View Upvotes
Many countries are releasing open-data portals. For data relative to Italy, these are the
main links:
- http://www.dati.gov.it/ (the main governmental website)
- http://dati.piemonte.it/ (data portal for Piemonte region, the first regional portal
developed)
- http://dati.emilia-romagna.it/ (data portal for Emilia Romagna region)
- http://data.enel.com/ (data portal for the ENEL company, a energy/gas supplier)
3.1k Views View Upvotes
Google Research released a large 24GB n-gram data set back in 2006 based on processing
10^12 words of text and published counts of all sequences up to 5 words in length:
http://googleresearch.blogspot.c...
You can also just search over a related data set via the Google Books Ngram Viewer:
http://books.google.com/ngrams/
47k Views View Upvotes
Jim Kenyon, Data science practitioner - all models are wrong. some models are
useful.
Written Jan 23, 2014
Data.gov
http://Quandl.com has over 10 million data sets gleaned from all over the internet. The
great thing about this resource is that it gives a single way to access all of the data. The site
has a free Excel plug in or there are libraries in R, Python, Ruby, etc.
3.6k Views View Upvotes
Clinton Little, Coastal Program Specialist working to change the data climate in
Minnesota's ...
Updated Feb 4, 2011
Data.gov
http://www.data.gov/
Eliot Jarrett, Digital Brain / Analog Mind, Voracious Reader, Data Synthesizer,
Strategist
Updated Mar 29, 2012
Ossama Alami
Written Jul 4, 2014
Firebase provides a number of realtime datasets for free: Firebase Open Data Sets .
They're easy to use in web or mobile apps, some data sets available:
Cryptocurrency/USD Exchange Rates (Bitcoin, Litecoin, Dogecoin)
Realtime Global Earthquake data
Public transit data & bus GPS positions for several US cities
Airport delay data
Although there are lots of answers here, many that look very good,
http://www.wolframalpha.com is a search engine which spiders and houses most
open data that is findable on the Web. It also allows you to use your query syntax to
preform calculations, making it a true computation engine. I love it and use it for a variety
of purposes myself.
1.4k Views View Upvotes
Pardeep Kullar, SaaS, Email marketing, Social tools and pilgrims pizza
Written Nov 15, 2015
There are some companies where, on their free trial, you can get free data.
For example: FollowerWonk (Twitter analytics, follower segmentation, social graph
tracking, & more ) lets you download up to 50,000 followers of any Twitter account.
Datadrip (Free data into sales ) has a bunch of followerwonk files like 50,000 CEOs that
you can download from the home page.
1k Views View Upvotes
Recommendation Systems
MovieLens: Two datasets available from GroupLens . The first dataset has 100,000
ratings for 1682 movies by 943 users, subdivided into five disjoint subsets. The second
dataset has about 1 million ratings for 3900 movies by 6040 users.
Jester: This dataset contains 4.1 million continuous ratings (-10.00 to +10.00) of 100 jokes
from 73,421 users.
Netflix Prize: Netflix released an anonymised version of their movie rating dataset; it
consists of 100 million ratings, done by 480,000 users who have rated between 1 and all of
the 17,770 movies.
Book-Crossing dataset: This dataset is from the Book-Crossing community, and contains
278,858 users providing 1,149,780 ratings about 271,379 books.
1.7k Views View Upvotes
A2A. Depending on the type of datasets you're interested in, I'd suggest taking a look at
https://www.reddit.com/r/datasets , or maybe Data.gov (The U.S. government's
open data) or Disability and Health (CDC datasets).
Some other random sets I recall/have used before are:
Google Public Data Explorer
Webscope | Yahoo Labs
Overview | Yelp For Developers | Yelp
AWS Public Data Sets
Beer Data
This list is by no means exhaustive, and some Googling can get you a lot more - but it's
what I was able to come up with off the top of my head.
235 Views View Upvotes Answer requested by Joy Xu
John Goodwin
Written Sep 13, 2011
Shafqat Islam, CEO & Cofounder of NewsCred. We help web publishers delight
their users (and ...
Written Apr 6, 2011
We have a 20 million+ dataset (last three years) of news articles (headline, description,
plus metadata). The data can be access programatically via an API at
http://developer.newscred.com .
People have done some really interesting things with it. We could potentially make it
available as a dump file if someone wants it for research purposes.
1.5k Views
Vaibhav Mallya, Jobhunting? LMK-I will help you get what you are worth.
OerLetter.io Founder.
Updated Sep 23, 2011
There are some text corpora here: Where can I find large datasets open to the public?
If you're looking for a vast source of public domain literature, Project Gutenberg is
wonderful: http://www.gutenberg.org/wiki/Ma...
The Presidential Speech Archive: http://millercenter.org/scripps/...
Hitler's Speeches: http://www.hitler.org/speeches/
The Vedas: http://www.sacred-texts.com/hin/
The Gita: http://www.gita4free.com/english...
The Bible: http://patriot.net/~bmcgin/kjvpa...
Take a look at the NYT archive: http://www.nytimes.com/ref/membe...
929 Views View Upvotes
Amazon has announced Public datasets hosted on aws at no charge for the community.
This datasets can be seamlessly integrated with your application running on aws. Pay per
use.
https://aws.amazon.com/datasets?...
4.6k Views View Upvotes
Academic Torrents : Distributing large datasets using torrents, this project was started
very recently and has some of the most interesting datasets.
8.1k Views View Upvotes
The Sloan Digital Sky Survey seems to not have been mentioned yet:
http://www.sdss.org
350 million objects on the night sky, many different measured parameters for each of
them.
1.1k Views View Upvotes
Nidhi Kohli
Written Oct 4, 2013
Matthew Hurst
Written May 11, 2011
d8taplex.com (which I run) has >1MM time series in >50k data sets pulled from 122
sites. The data sets are derived automatically from resources like excel spreadsheets, html
tables and plain csv and tsv files.
1k Views
Georey Anderson, Former Data Processing Product Manager for InfoGroup &
General DBA that likes...
Updated Apr 15, 2011
There are several free providers on Microsoft's Azure Data Mart for the time being
including several of the mentioned above. The single platform for delivery and excel plugin will make the data easier to consume however than your typical API / SOAP end point.
https://datamarket.azure.com/
1.3k Views View Upvotes
Written Jul 14
There is an open data source of Open Data | UNCDF with about 10 developing country
data sets. There is a detailed zip file in the export with all 1000+ questions.
69 Views View Upvotes
Google Public Data Explorer is one good dataset. Its not large, but has valuable data
regarding economics and other factors of human development. For example this is
about income inequality in the United States.
Google Trends
If you're looking for public data you should definitely take a look at Knoema
(http://knoema.com ). Knoema is one-stop shop for your data needs.
Here you will find 600+ public datasets on almost any topic like economics, healthcare,
demographics or energy. Knoema accumulated public data from many credible
international sources in a single place and provides convenient search/browsing tools
881 Views View Upvotes
Depends on what you are looking for. Wikipedia is the best crowdsourced data set
available for generic use. Now, if you are looking for domain specific data sets (e.g., query
logs, annotations, entities, etc.), that's a different matter.
1.4k Views View Upvotes Answer requested by Martin Engwicht
Where can you find them? Stop looking and start building them yourself.
The internet is One Big Data set waiting to be made, and it's laughably easy to combine
data many many websites to make a large table of data these days.
Any of the modern web scrapers will let even a 'non-programmer' put together a data set
very quickly and easily.
I know this because I work at http://import.io and our platform is being used to create
datasets with billions of data points every single day.
I suppose the main reason i suggest this that you can be free from needing other people
build big data sets for you, and make your own, becoming more independent in the
process.
694 Views View Upvotes
Agastya Mishra
Written May 17, 2013
Bob Calder, Internet and Society, Science and Society Fort Lauderdale, FL
Written Jan 23, 2011
Tom Greif
Written Jul 22, 2013
Stack Exchange Data Dump - Anonymized data dump of all creative commons questions
and answers from the Stack Exchange family of websites at thttp://stackexchange.com
/sites
I was doing this research few days ago and found these
http://www.delicious.com/pskomor...
http://www.datawrangling.com/som...
http://www.day-trading-stocks.or...
http://www.kdnuggets.com/datasets
http://data.worldbank.org/
http://setiquest.org/ -(You need to sign up)
http://www.grouplens.org/node/73
1k Views View Upvotes
Many people use the bible, as it is available in many languages and many different
versions. Another option is to find the proceedings of the UN, which is also published in
many different languages.
1k Views View Upvotes
on dataTau.
Frank Scurlock
Written Dec 11, 2014
I did some research on low impact fuel sources vs. coal in power plants larger than 50
megawatts. I found these to be helpful.
DepartmentofEnergy(DOE)OpenNetdocumentsOSTI
https://www.osti.gov/opennet
l
DepartmentofEnergy (DOE) declassified documents, part of DOE openness initiative. ...
The OpenNet database provides easy, timely access to over 485,000 ...
DOEGlobalEnergyStorageDatabase
www.energystorageexchange.org/
l
The DOE Global Energy Storage Database provides free, up-to-date information on gridconnected energy storage projects and relevant state and federal ...
GasificationPlantDatabases
www.netl.doe.gov/research/coal/energy.../gasificationplantdatabases
l
Welcome to the U. S. DepartmentofEnergy, National Energy Technology Laboratory's
Gasification Plant Databases. Within these databases you will find current ...
854 Views View Upvotes
Google and the USPTO make bulk downloads of US patents and trademarks available in
zip archives:
USPTO Bulk Downloads
2.2k Views View Upvotes
Konstantinos Psychas
Written Apr 18, 2013
The following platform hosts open data to help in scientific analysis and computational
research.
Contribute to the Cure
Information about the platform which is currently in beta are here (Sage Bionetworks Redefining. Challenging. Predicting )
1.6k Views View Upvotes
There are some great datasets relating to Bioinformatics out there. These are usually
databases of molecules of biological interest.
BLAST: http://blast.ncbi.nlm.nih.gov/Bl...
SCOP: http://scop.mrc-lmb.cam.ac.uk/sc...
There are many others - a huge amount of information is available in this field.
1k Views View Upvotes
The Pubmed Central Open Access Subset contains about 350000 full-text academic
articles in the Biosciences over more than 2000 journals. You can download the lot as
compressed XML files via FTP: http://www.ncbi.nlm.nih.gov/pmc/...
1.7k Views View Upvotes
Arya Asemanfar
Written Jan 10, 2011
caesar0301/awesome-public-datasets
here you can find all type of public datasets. Its a awesome list of all type of resources of
datasets.
55 Views View Upvotes
Joscelyn Upendran
Written Jan 10, 2011
There's themed linked open data being published under the Open Government Licence
(OGL) on the Hampshire Hub at:
http://data.hampshirehub.net/def/concept/folders/themes
2.1k Views View Upvotes
Sebastian ScheIter, Committer and PMC member at Apache Mahout and Apache
Giraph
Written Sep 13, 2011
Shunsuke Mikami
Written Jun 6, 2011
The Internet Traffic Archive http://ita.ee.lbl.gov/ publish some Web access logs.
For example, http://ita.ee.lbl.gov/html/contr... were access logs from 1998 World Cup
Web site between April 30, 1998 and July 26, 1998. During this period of time the site
received 1,352,804,107 requests.
946 Views View Upvotes
Rob Jensen, always learning, doing more doing. interested in data science,
minimalism and...
Written Jan 10, 2011
If not already, subscribe the the Guardian's DataBlog. They have great articles and always
link out to the data so you can play with it.
http://www.guardian.co.uk/news/d...
984 Views View Upvotes
Raymond Lam
Written May 5
Data
This has a large list of data collected from around the world and not limited to one
organisation. You have the ability to view the data sets, download the data as a .xlsx file or
visualise the data in browser.
159 Views View Upvotes
You can find some free Twitter datasets (about 200,000 tweets per dataset) in Datasets
section (Datasets Archive - Followthehashtag // Free twitter search analytics and business
intelligence tool ) of Followthehashta (http://www.followthehashtag.com)g
This section is brand new (2016 / 04) and we are adding about 2 or 3 new datasets per
week, hope you enjoy it
If you need custom datasets (paid) in this URL you can see pricing for datasets from 2000
to 200,000 tweets (>Followthehashtag // Twitter keyword search analytics, influence, geo
content analysis tool, and much more )
77 Views
Ian Mercer, Prolic Entrepreneur, Inventor, Guinness World Record Holder and
creator of ...
Written Jan 10, 2011
caesar0301/awesome-public-datasets
1.6k Views View Upvotes
Andrew Semenyak
Written Nov 11, 2013
Here are two sample datasets with companies data available for free:
Jonas Mattias
Written Apr 5, 2011
Bill Sobel
Written Jan 11, 2011
Ganesh Raja
Written Feb 16, 2015
Amazon Web Services have public data sets that you can use freely for your big data
projects. You can also contribute to the list.
Please find more information here aws.amazon.com/public-data-sets/
346 Views
If you're looking for US economic data or time series, try FRED. It's free, comprehensive,
and regularly updated. Provided by the St. Louis Fed.
research.stlouisfed.org/fred2
1.9k Views View Upvotes
OpenStreetMap
482 Views
Ankush Chopra
Written Mar 27, 2014
Annie Pettit, Self serve sample, surveys, polling plus charts and statistics. I am
the Chie...
Written Oct 9, 2014
DataFerrett (U.S. Census Bureau) is a great option for US census data. Lots of data you
can plug directly into any statistics program.
1.4k Views View Upvotes
Guilherme Defreitas
Written Jun 5, 2015
Anonymous
Written Jul 16
Big data analytics is to help companies make more informed business decisions by
enabling DATA Scientist, predictive modelers and other analytics professionals to analyze
large volumes of transaction data, as well as other forms of data that may be untapped by
conventional business intelligence(BI) programs. That could include Web server logs and
Internet Click Stream data, social media content and social network activity reports, text
from customer emails and survey responses, mobile-phone call detail records and machine
data captured by sensors connected to the INTERNET Things Some people exclusively
associate big data with semi-structured and unstructured Data of that sort, but consulting
firms like Gartner Inc. and Forrester Research Inc. also consider transactions and other
structured data to be valid components of big data analytics applications. Big Data, Data
Science - Combo Course Training Classes Online | Big Data, Data Science - Combo Course
Courses Online
Big data can be analyzed with the software tools commonly used as part of Advance
Analytics disciplines such as Predictive Analysis Data Mining, Text Analytics and Statical
Method. Mainstream BI software and Visualization tools can also play a role in the analysis
process. But the semi-structured and unstructured data may not fit well in traditional Data
Warehouse based on Relational Database. Furthermore, data warehouses may not be able
to handle the processing demands posed by sets of big data that need to be updated
frequently or even continually -- for example, real-time data on the performance of mobile
applications or of oil and gas pipelines. As a result, many organizations looking to collect,
process and analyze big data have turned to a newer class of technologies that includes
Hadoop and related tools such as Yarn Spook, Spark, and Pig as well as No Sql databases.
Those technologies form the core of an open source software framework that supports the
processing of large and diverse data sets across clustered systems.
In some cases, Hadoop Cluster and No SQL systems are being used as landing pads and
staging areas for data before it gets loaded into a data warehouse for analysis, often in a
summarized form that is more conducive to relational structures. Increasingly though, big
data vendors are pushing the concept of a Hadoop Data Take that serves as the central
repository for an organization's incoming streams of Raw Data. In such architectures,
subsets of the data can then be filtered for analysis in data warehouses and Analytics
Databases, or it can be analyzed directly in Hadoop using batch query tools, stream
processing software and Sql AND Hdoop technologies that run interactive, ad hoc queries
written in Sql Potential pitfalls that can trip up organizations on big data analytics
initiatives include a lack of internal analytics skills and the high cost of hiring experienced
analytics professionals. The amount of information that's typically involved, and its
variety, can also cause data management headaches, including Data Quality and
consistency issues. In addition, integrating Hadoop systems and data warehouses can be a
challenge, although various vendors now offer software connectors between Hadoop and
relational databases, as well as other data integration tools with big data capabilities.
Businesses are using the power of insights provided by big data to instantaneously
establish who did what, when and where. The biggest value created by these timely,
meaningful insights from large data sets is often the effective enterprise decision-making
that the insights enable.
Extrapolating valuable insights from very large amounts of structured and unstructured
data from disparate sources in different formats require the proper structure and the
proper tools. To obtain the maximum business impact, this process also requires a precise
combination of people
208 Views
Simon Tse, Trying to learn something new every day that I nd refreshing
Written Mar 16
For Machine learning purposes a lot of data sets are availabile on the UCI Machine
Learning Repository
230 Views
Nikita Zhiltsov, Computer science researcher at Kazan University; Textocat, cofounder & CTO
Written Apr 5, 2011
http://getthedata.org
http://datamarket.com/
opened at oreilly strateconf
2.6k Views View Upvotes
Global Biodiversity Information Facility has the largest biodiversity dataset, with 600M +
records currently: Free and Open Access to Biodiversity Data
272 Views
Philippe Beaudoin, I've written my share of C++, working on many projects in the
video game indu...
Written Apr 6, 2011
Niall McCarthy
Written Jan 24, 2013
You can find a huge selection of free statistics, data and infographics at Statista .
880 Views View Upvotes
Colin Kegler
Written May 11, 2013
Michael Munsey
Written Mar 19, 2014
Iain Chalmers, Web Strategist. Motorcycle Rider. Music Lover. Coee Tragic.
Written Apr 5, 2011
Evan Thomas, World traveler, surfer, internet marketer, UCSB alumnus from
Manhattan Beach, CA
Written Dec 5, 2011
Findthedata.org
11.8k Views View Upvotes
US Department of Energy has weather data available for free for over 2000 global
locations:
http://apps1.eere.energy.gov/bui...
1.8k Views View Upvotes
Evan Schuss
Written Jun 15, 2011
Junar.com is great source for data and statics pertaining to populations of people,
business, sports, geography and also other types of data. This site is a collaboration of data
from around the web and is continually expanding its entries.
847 Views View Upvotes
Common Crawl makes available for free ~250 TB of web page data from 2008-2012. - |
CommonCrawl
1.4k Views View Upvotes
Andrey Fedorov
Written May 2, 2013
http://www.cancerimagingarchive....
http://cancergenome.nih.gov/
934 Views View Upvotes
How about the Center for Responsive POlitics and its site Opensecrets.org
1k Views
Mark Hahnel, PhD in Stem Cells at Imperial College London, Founder of gshare
Written Apr 8, 2011
and DataMarketplace.com
MovieLens
Ideal site for trying out movie recommendations
384 Views
archive.ics.uci.edu/ml/
454 Views
Athlan Lathan
Written May 30, 2015
Opendatanetwork.com
Some large datasets, some small, all public.
782 Views View Upvotes
You can get large datasets from the sources,mentioned in Where can I find large datasets
open to the public?
372 Views
On the India Water Portal we have a 100 year dataset of the meteorological data for all the
districts of India:
http://www.indiawaterportal.org/...
842 Views View Upvotes
Pete Warden summarizes some of the options here that he covers in "Data Source
Handbook" from O'Reilly:
http://petewarden.typepad.com/se...
Here are 18 data-related links that Warden points to in addition to whats covered in the
book--for those wanting to learn more:
http://petewarden.typepad.com/se...
2.2k Views View Upvotes
Olya Romanova
Updated Sep 26, 2013
Check Knoema via http://knoema.com - the largest open and public data repository with
100 M+ time series and 3000+ datasets
1.4k Views View Upvotes
Milstein Munakami
Written Feb 10, 2015
Milstein/awesome-public-datasets
672 Views View Upvotes
Nazmul Hasan
Written Jul 26, 2013
John Wong
Written Oct 5, 2014
Enigma.io is a product that aggregates thousands publicly available data sets. Over 80
Billion rows of data in 100,000 tables. Also available in API.
828 Views View Upvotes
Dan Bair
Written Feb 2, 2011
Here is another link that lists some publicly available data sets.
Link: http://highscalability.com/blog/...
246 Views
Vincent van Haa, software engineer, data viz expert, ux designer, hacker, vj,
musician, cyclis...
Written Apr 6, 2011
http://data.vancouver.ca/
640 Views
I have develop a charting platform which also allow user to download data after register
free membership.
https://chartist.deltaspace.com.sg
55 Views
Margaret Warren
Written Apr 5, 2011
Robert Loftin
http://USGovXML.com
4.5k Views View Upvotes
Here are 2 good resources with Open Data from the EU: http://publicdata.eu/
http://lod2.okfn.org/eu-data-cat...
684 Views
Wikipedia has been mentioned but I didn't see a link. This is for current articles.
http://en.wikipedia.org/wiki/Wik...
722 Views View Upvotes
Igor Kiselev
Written Jun 6, 2015
Biologists have a huge amount of public data at NCBI: ftp://ftp.ncbi.nih.gov/ . The total
size may be close to 1PB.
984 Views View Upvotes
Richard Pauli
Written Apr 5, 2011
Robert Prescott
Written May 17, 2013
Page on Sciencebase
Survey
919 Views View Upvotes
Ben Toth
Sign In
http://www.google.com/publicdata...
http://www.ic.nhs.uk/statistics-...
and
Teng Qiu
Written May 11, 2014
wikidata.org
and freebase.com
The best that aggregates all OPEN government data, as an API, is:
http://www.pediacities.com
805 Views View Upvotes
Another one for a long list: The Guardian lets you search for open government data from
around the world at
http://www.guardian.co.uk/world-...
587 Views View Upvotes
Misha Denil
Written Feb 2, 2011
Peter Skomoroch has a delicious page with links to many data sets.
http://www.delicious.com/pskomor...
196 Views
Please check this question : Where can I find large datasets open to the public?
269 Views View Upvotes
Jim Shi
Written Mar 30, 2013
Owen Stephens
Written Dec 9, 2011
Daniel McNamara
Written Jul 19, 2011
www.kaggle.com has datasets freely available and data analysis competitions with
prizemoney attached
206 Views
Martin Kelly
Written Apr 10, 2011
http://www-958.ibm.com/software/...
Enrique Cusba
Written May 21, 2011
Anonymous
Updated Sep 19, 2011
Sign In
Related Questions
Where can I nd datasets (open to public) of eCommerce websites?
Where can I nd large historic datasets on ex-employees or recruitment open to the
public?
Where can I nd large data sets open to the public of all available drugs and medicines?
Where can I nd large datasets open to the public for merger and acquisition integration
performance?
What large, open and public datasets are there for Educational Data Mining?
What is the most comprehensive list of international open government datasets?
Where can I nd web analytics datasets open to the public?
Where can I nd large bank and credit related datasets open to the public?
Datasets: Where can I nd home address histories for large numbers anonymous
individuals?
Datasets: Where can I nd a corpus open to the public concerning controversies over ecigarette?
Where can I nd complain-related large datasets open to public?
What kinds of large datasets open to the public do you analyze the mostly?
How can I get large datasets collected from sensors? For example thermo dataset like
(temperature, humidity, wind speed, etc)!
What are the most extensive media and TV listing datasets open to the public?
Where can I nd publicly available automotive datasets or OBD2 datasets?
Top Stories
How do Americans view George
W. Bush's handling of Hurricane
Katrina?
Updated Mar 4
My name is Jeremy.
It is pronounced j eh - r uh - m ee.
But some white people want to call me
Jerome or sometimes Jerry. They tend to be
older white people.
Then, some Black people want to call me
Germey or Jermey like Germ E. Or
Read More
Germany. As a nickname,
I would
sometimes get called Germs or Germey
Sitemap # A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
About - Careers - Privacy - Terms - Contact