Sunteți pe pagina 1din 23

LESSON 3:

Data Collection Part I:


Unstructured Data

Linking to plan is
critical for efficient
data collection

Unstructured and
structured data
differ in key ways

Unstructured
Primary ways to
Tools make the
data can have neat collect unstructured
entire WWW a
headings, rows
data on the Web
viable data source

Documenting your plan at this stage


is critical to success
Business
Objective

Grow
Loyalty

Key Question

Data > Source(s)*

How has consumer interest in


our brand trended over time?

Search Volume > Google Trends


Customer Inquiries >
CSR Database

What consumer group is


our strongest advocate?

Consumer Groups > Segmentation


Study
Twitter Volume > Twitter API

Which marketing programs


have grown advocacy?

Marketing Events >


Company Intranet Site
Hashtag Volume > Topsy

Note: * Here source is used in a way synonymous with tool

Documenting your plan at this stage


is critical to success
Business
Objective

Grow
Loyalty

Key Question

Data > Source(s)*

How has consumer interest in


our brand trended over time?

Search Volume > Google Trends


Customer Inquiries >
CSR Database

What consumer group is


our strongest advocate?

Consumer Groups > Segmentation


Study
Twitter Volume > Twitter API

Which marketing programs


have grown advocacy?

Marketing Events >


Company Intranet Site
Hashtag Volume > Topsy

Note: * Here source is used in a way synonymous with tool

Data collected will be in one of two forms


Unstructured Data

Structured Data

Information that does not have a

Information that includes a data

Typically text-heavy, but may contain

Typically well-defined and organized

Might account for more than 70%

Generally, but not always, easier to

Frequently requires the use of a data

Frequently can be imported directly

pre-defined data model

data such as dates, numbers, and


facts as well
80% of all data in organizations
mining tool, such as R,
to collect

Source: Wikipedia, IDC Digital Universe Study (2011)

model

data with an expected format as


determined by the data model
collect than unstructured data

into a data management system

Technology growth has led to new


online access points

Bulk Downloads

APIs

Web Scraping

Technology growth has led to new


online access points

Bulk Downloads

APIs

Web Scraping

Where do I find raw data and metrics?


The U.S. Bureau of Census (http://www.census.gov/)

The main website for census data in the U.S. Large amounts of downloadable
data on population, demographics, and other indicators

Bureau of Economic Analysis (http://www.bea.gov/)

The BEA provides data and information for regional, national, and international
levels as well as by industries

Bureau of Labor Statistics (http://www.bls.gov)

Homepage for Bureau of Labor Statistics provides access not only to data
and tables but also to publications and up-to-the-minute factoids

Note: This list is just a sampling of available raw data resources and is not intended to be exhaustive

Where do I find raw data and metrics? (continued)


DATA.GOV (http://www.data.gov)

The home of the U.S. Governments open data. Here you will find data, tools,
and resources to conduct research, develop web and mobile applications,
design data visualizations, and more

CDC&P Statistics (http://www.cdc.gov/DataStatistics/)

Data warehouse for all government-related health and medical statistics and
surveys as well as links to other agencies outside the U.S. Government

UNdata (http://data.un.org)

Many UN statistical databases via a single entry point. Users can now search
and download a variety of statistical resources of the UN system
Note: This list is just a sampling of available raw data resources and is not intended to be exhaustive

Free data sources are everywhere to be found


on the Web (continued)
www.cia.gov/library

www.jdpower.com

www.clickz.com

http://jmc.ou.edu/FredBeard/
FredBeardHome.html

www.comscoredatamine.com
www.crunchbase.com
fedstats.sites.usa.gov
www.gallup.com
www.google.com/finance
www.google.com/publicdata
ngrams.googlelabs.com
www.grabstats.com
www.infousa.com

www.marketresearch.com
www.melissadata.com
www.mint.com/blog/trends
www.nationmaster.com
www.neoformix.com
www.oecd.org
blog.okcupid.com
people-press.org

Note: This list is just a sampling of available raw data resources and is not intended to be exhaustive

Free data sources are everywhere to be found


on the Web
www.pewinternet.org

unstats.un.org

www.quantcast.com

www.visualeconomics.com

www.realtimestatistics.org

www.warc.com

research.stlouisfed.org

datacatalog.worldbank.org

statehealthstats.
americashealthrankings.org

www.youtube.com/trendsdashboard

trendwatching.com

zipwho.com

viralvideochart.unrulymedia.com

Note: This list is just a sampling of available raw data resources and is not intended to be exhaustive

Technology growth has led to new


online access points

Bulk Downloads

APIs

Web Scraping

R-Project Logo, (C) R Foundation, from http://www.r-project.org

R is an accessible, flexible data mining tool

Free

Open source with practitioners all over the world

Available for either Mac or Windows PC platforms

Powerful, yet easy to learn and use

R is more user-friendly with the addition of R Studio GUI

+
www.r-project.org

www.rstudio.com

RStudio and Shiny are trademarks of RStudio, Inc, from http://www.rstudio.com/about/trademark/

www. .com

LESSON 3:
Data Collection Part I:
Unstructured Data

Linking to plan is
critical for efficient
data collection

Unstructured and
structured data
differ in key ways

Unstructured
Primary ways to
Tools make the
data can have neat collect unstructured
entire WWW a
headings, rows
data on the Web
viable data source

Supplemental reading for this lesson

Installing R on a Mac:
https://www.youtube.com/watch?v=xokJUwn0mis

Installing R on a Windows machine:


https://www.youtube.com/watch?v=LII6of-5Odw

Download site for R:


http://cran.r-project.org/

Download site for R Studio:


http://www.rstudio.com/products/rstudio/download/

References
1. Wikipedia contributors, "Unstructured data,"Wikipedia, The
Free Encyclopedia,http://en.wikipedia.org/w/index.php?
title=Unstructured_data&oldid=649684563

2. R Foundation. 2012. Logo for R. http://www.r-project.org/

3. RStudio and Shiny are trademarks of RStudio, Inc, from


http://www.rstudio.com/about/trademark/

S-ar putea să vă placă și