data collection

Published on June 2016 | Categories: Documents | Downloads: 76 | Comments: 0 | Views: 1086
of 23
Download PDF   Embed   Report

collecting data for analysis

Comments

Content

LESSON 3:
“Data Collection Part I: 

Unstructured Data”

Linking to plan is
critical for efficient
data collection

Unstructured and
structured data
differ in key ways

3

Unstructured
Primary ways to
Tools make the
data can have neat collect unstructured
entire WWW a
headings, rows
data on the Web
viable data source

Documenting your plan at this stage 

is critical to success
Business
Objective

Grow
Loyalty

Key Question

Data —> Source(s)*

How has consumer interest in
our brand trended over time?

Search Volume —> Google Trends
Customer Inquiries —>
CSR Database

What consumer group is
our strongest advocate?

Consumer Groups —> Segmentation
Study
Twitter Volume —> Twitter API

Which marketing programs
have grown advocacy?

Marketing Events —>
Company Intranet Site
Hashtag Volume —> Topsy

Note: * Here “source” is used in a way synonymous with “tool”

Documenting your plan at this stage 

is critical to success
Business
Objective

Grow
Loyalty

Key Question

Data —> Source(s)*

How has consumer interest in
our brand trended over time?

Search Volume —> Google Trends
Customer Inquiries —>
CSR Database

What consumer group is
our strongest advocate?

Consumer Groups —> Segmentation
Study
Twitter Volume —> Twitter API

Which marketing programs
have grown advocacy?

Marketing Events —>
Company Intranet Site
Hashtag Volume —> Topsy

Note: * Here “source” is used in a way synonymous with “tool”

Data collected will be in one of two forms
Unstructured Data

Structured Data

‣ Information that does not have a

‣ Information that includes a data

‣ Typically text-heavy, but may contain

‣ Typically well-defined and organized

‣ Might account for more than 70%–

‣ Generally, but not always, easier to

‣ Frequently requires the use of a data

‣ Frequently can be imported directly

pre-defined data model

data such as dates, numbers, and
facts as well
80% of all data in organizations
mining tool, such as R, 

to collect

Source: Wikipedia, IDC “Digital Universe Study” (2011)

model

data with an expected format as
determined by the data model
collect than unstructured data

into a data management system

Technology growth has led to new 

online access points

Bulk Downloads

APIs

Web Scraping

Technology growth has led to new 

online access points

Bulk Downloads

APIs

Web Scraping

Where do I find raw data and metrics?
The U.S. Bureau of Census (http://www.census.gov/)

The main website for census data in the U.S. Large amounts of downloadable
data on population, demographics, and other indicators


Bureau of Economic Analysis (http://www.bea.gov/)

The BEA provides data and information for regional, national, and international
levels as well as by industries


Bureau of Labor Statistics (http://www.bls.gov)

Homepage for Bureau of Labor Statistics provides access not only to data
and tables but also to publications and up-to-the-minute factoids

Note: This list is just a sampling of available raw data resources and is not intended to be exhaustive

Where do I find raw data and metrics? (continued)
DATA.GOV (http://www.data.gov)

The home of the U.S. Government’s open data. Here you will find data, tools,
and resources to conduct research, develop web and mobile applications,
design data visualizations, and more


CDC&P Statistics (http://www.cdc.gov/DataStatistics/)

Data warehouse for all government-related health and medical statistics and
surveys – as well as links to other agencies outside the U.S. Government


UNdata (http://data.un.org)

Many UN statistical databases via a single entry point. Users can now search
and download a variety of statistical resources of the UN system
Note: This list is just a sampling of available raw data resources and is not intended to be exhaustive

Free data sources are everywhere to be found 

on the Web (continued)
‣www.cia.gov/library

‣www.jdpower.com

‣www.clickz.com

‣http://jmc.ou.edu/FredBeard/
FredBeardHome.html

‣www.comscoredatamine.com
‣www.crunchbase.com
‣fedstats.sites.usa.gov
‣www.gallup.com
‣www.google.com/finance
‣www.google.com/publicdata
‣ngrams.googlelabs.com
‣www.grabstats.com
‣www.infousa.com

‣www.marketresearch.com
‣www.melissadata.com
‣www.mint.com/blog/trends
‣www.nationmaster.com
‣www.neoformix.com
‣www.oecd.org
‣blog.okcupid.com
‣people-press.org

Note: This list is just a sampling of available raw data resources and is not intended to be exhaustive

Free data sources are everywhere to be found 

on the Web
‣www.pewinternet.org

‣unstats.un.org

‣www.quantcast.com

‣www.visualeconomics.com

‣www.realtimestatistics.org

‣www.warc.com

‣research.stlouisfed.org

‣datacatalog.worldbank.org

‣statehealthstats.

americashealthrankings.org

‣www.youtube.com/trendsdashboard

‣trendwatching.com

‣zipwho.com

‣viralvideochart.unrulymedia.com

Note: This list is just a sampling of available raw data resources and is not intended to be exhaustive

Technology growth has led to new 

online access points

Bulk Downloads

APIs

Web Scraping

R-Project Logo, (C) R Foundation, from http://www.r-project.org

R is an accessible, flexible data mining tool



Free




Open source with practitioners all over the world




Available for either Mac or Windows PC platforms




Powerful, yet easy to learn and use

R is more user-friendly with the addition of R Studio GUI

+
www.r-project.org

www.rstudio.com

RStudio and Shiny are trademarks of RStudio, Inc, from http://www.rstudio.com/about/trademark/

*

www. .com

LESSON 3:
“Data Collection Part I: 

Unstructured Data”

Linking to plan is
critical for efficient
data collection

Unstructured and
structured data
differ in key ways

3

Unstructured
Primary ways to
Tools make the
data can have neat collect unstructured
entire WWW a
headings, rows
data on the Web
viable data source

Supplemental reading for this lesson


Installing R on a Mac: 

https://www.youtube.com/watch?v=xokJUwn0mis




Installing R on a Windows machine: 

https://www.youtube.com/watch?v=LII6of-5Odw




Download site for R: 

http://cran.r-project.org/




Download site for R Studio: 

http://www.rstudio.com/products/rstudio/download/

References
1. Wikipedia contributors, "Unstructured data," Wikipedia, The
Free Encyclopedia, http://en.wikipedia.org/w/index.php?
title=Unstructured_data&oldid=649684563 

2. R Foundation. 2012. “Logo for R”. http://www.r-project.org/

3. RStudio and Shiny are trademarks of RStudio, Inc, from
http://www.rstudio.com/about/trademark/

Sponsor Documents

Or use your account on DocShare.tips

Hide

Forgot your password?

Or register your new account on DocShare.tips

Hide

Lost your password? Please enter your email address. You will receive a link to create a new password.

Back to log-in

Close