Arcs Can

Published on January 2017 | Categories: Documents | Downloads: 49 | Comments: 0 | Views: 345
of 62
Download PDF   Embed   Report

Comments

Content

G-141/3.36.06

Scanning Data Entry
Solutions for
ARC/INFO GIS

An ESRI White Paper

Contents

Page

Executive Summary

1

Evaluating Scanning Data Entry

7

ESRI Scanning Data Entry Solutions—A GIS Focus

31

Glossary

45

Environmental Systems Research Institute, Inc.
380 New York Street, Redlands, CA 92373

(909) 793-2853
Fax (909) 793-5953
Telex 910 332 1317

Copyright © 1995 Environmental Systems Research Institute, Inc.
All rights reserved.
Printed in the United States of America.
The information contained in this document is the exclusive property of Environmental Systems Research
Institute, Inc. This work is protected under United States copyright law and other international copyright
treaties and conventions. ESRI grants the recipient of the ESRI information contained herein the right to
freely reproduce, redistribute, rebroadcast, and/or retransmit this information for personal, noncommercial
purposes including, teaching, classroom use, scholarship, and/or research, subject to the fair use rights
enumerated in Section 107 and 108 of the Copyright Act (Title 17 of the United States Code). No part of this
work may be reproduced or transmitted for commercial purposes in any form or by any means, electronic or
mechanical, including photocopying and recording, or by any information storage or retrieval system, except
as expressly permitted in writing by Environmental Systems Research Institute, Inc. All requests should be
sent to Environmental Systems Research Institute, Inc., 380 New York Street, Redlands, CA 92373 USA,
Attention: Contracts Manager.
The information contained in this document is subject to change without notice.
RESTRICTED RIGHTS LEGEND
Use, duplication, and disclosure by the government are subject to restrictions as set forth in FAR §52.227-14
Alternate III (g)(3) (JUN 1987), FAR §52.227-19 (JUN 1987), or DFARS §252.227-7013 (c)(1)(ii) (OCT
1988), as applicable. Contractor/Manufacturer is Environmental Systems Research Institute, Inc., 380 New
York Street, Redlands, CA 92373 USA.
ESRI, ARC/INFO, PC ARC/INFO, ArcView, and ArcCAD are registered trademarks; ARC COGO, ARC
NETWORK, ARC TIN, ARC GRID, ARC/INFO LIBRARIAN, ARCPLOT, ARCEDIT, TABLES,
Application Development Framework (ADF), ARC Macro Language (AML), Avenue, FormEdit, ArcSdl,
ArcBrowser, ArcDoc, ARCLine, ARCSHELL, IMAGE INTEGRATOR, DATABASE INTEGRATOR, DBI
Kit, WorkStation ARC/INFO, ArcTools, ArcStorm, ArcScan, ArcExpress, ArcPress, Mapplets, SPATIAL
DATABASE ENGINE (SDE), PC ARCEDIT, PC ARCPLOT, PC DATA CONVERSION, PC NETWORK,
PC OVERLAY, PC STARTER KIT, PC ARCSHELL, Simple Macro Language (SML), ArcUSA, ArcWorld,
ArcScene, ArcCensus, ArcCity, the ESRI corporate logo, the ESRI globe logo, the ARC/INFO logo, the ARC
COGO logo, the ARC NETWORK logo, the ARC TIN logo, the ARC GRID logo, the ARCPLOT logo, the
ARCEDIT logo, the Avenue logo, the ArcTools logo, the ArcStorm logo, the ArcScan logo, the ArcExpress
logo, the PC ARC/INFO logo, the ArcView logo, the ArcCAD logo, the ArcData logo, ARCware, ARC News,
ArcSchool, ESRI—Team GIS, ESRI—The GIS People, GIS by ESRI, ARC/INFO—The World's GIS,
Geographic User Interface (GUI), Geographic User System (GUS), Your Personal Geographic Information
System, and Geographic Table of Contents (GTC) are trademarks; and the ArcData Publishing Program,
ARCMAIL, ArcQuest, ArcWeb, and Rent-a-Tech are service marks of Environmental Systems Research
Institute, Inc.
The names of other companies and products herein are trademarks or registered trademarks of their respective
trademark owners.

G-141/3.36.06

Executive Summary
Geographic information systems (GISs) require accurate
digital geographic data. Traditional methods of
automating vector spatial data include three alternatives:
contracting with a service bureau, in-house table
digitizing, or COGO data entry. Users now have an
alternative to existing methods for creating vector
databases—an option called scanning data entry. This
new choice in data automation, scanning data entry, is
affordable; uses mature, reliable technology; can be
implemented with existing staff; is integrated with
existing GIS software and databases; and offers an array
of capabilities adaptable to a wide range of user
requirements.
Scanning data entry uses document scanning technology to create
raster data sets, or "digital pictures" of the documents. Raster data
have utility in GIS, and scanning data entry software tools can manage
and edit raster data. In addition to raster data tools, scanning data
entry software can efficiently convert raster data to vector data—and
vector format data is a requirement for many GIS applications. Thus,
scanning data entry provides a means of automating vector data as
well as creating low-cost raster databases.
GIS organizations planning their database automation strategy can
now consider in-house scanning data entry as the solution of choice.
Existing GIS organizations can evaluate scanning data entry as an
enhancement to their current data automation methods.

Environmental Systems Research Institute, Inc.
380 New York Street, Redlands, CA 92373

(909) 793-2853
Fax (909) 793-5953
Telex 910 332 1317

Executive Summary
2

G-141/3.36.06

Overview of Scanning Data Entry Pathways

Air Photos

Maps

Scanner

Raster
Database

• Georeference
• Raster edit
• Tiling
• Raster-to-raster
Conversion

Vectorize
• Raster-to-vector
conversion

Vectorize
• Heads-up
digitizing

Database merge with
• Existing vector data
• COGO data
• CAD data

ARC/INFO
Coverage
Database

March 1994

Executive Summary
3

G-141/3.36.06

Costs and Benefits
of Scanned
Data Entry

Cost analysis of GIS projects shows that database automation often
accounts for more than 75 percent of the total project expense.
Scanning data entry is a viable cost-reduction alternative for this most
expensive GIS component—data automation.
The cost of scanning data entry has decreased dramatically since 1990.
Even agencies with limited budgets and relatively small data automation projects are discovering that a scanning data entry strategy makes
good economic sense. Database automation strategies based on
scanning are cost-competitive with, and in some cases can be
significantly less expensive than, other methods.
Scanning technology is no longer the data capture solution of the
distant future, but is quickly becoming the preferred method of data
capture for GIS. Data automation methods within ESRI's Database
Development Group have shifted dramatically in recent years to
scanning data entry as the preferred solution.

Is Scanning Data
Entry Appropriate
for Your Project?

Scanning data entry provides many advantages to GIS users.
Scanning is generally faster and a great deal more accurate than table
digitizing. Scanned data that are subsequently vectorized have more
consistent coordinate placement than data entered through manual
digitizing. Scanning data entry methods can be easily learned and
used by existing staff. Scanning data entry can be used to generate
application-specific data that are not commercially available. Recent
technical advances in hardware and software have tailored scanning
data entry capabilities specifically for GIS requirements.
GIS users should carefully evaluate scanning data entry in the context
of project requirements, which can vary greatly. Factors to consider
include data sources and their availability, map quality, update
frequency, data volume, accuracy requirements, and system capacity.
As you consider various data automation options, the specific requirements of your project will guide your analysis of the alternatives. For
example, if your GIS application uses street centerline data with
address ranges, you may find standard "off-the-shelf" data from a
commercial data vendor a good solution. But commercial data may

Environmental Systems Research Institute, Inc.
380 New York Street, Redlands, CA 92373

(909) 793-2853
Fax (909) 793-5953
Telex 910 332 1317

Executive Summary
4

G-141/3.36.06

not exist for all of your organization's needs. For example, a data
layer such as property boundaries (i.e., land parcels) may not be
commercially available.
When you prefer to do data automation in-house, scanning data entry
takes no more time than table digitizing and offers better coordinate
accuracy and consistency without the random errors often associated
with manual methods. If you have an existing digital database and
need to perform only a low volume of intermittent updates, table
digitizing these updates may be appropriate. Even so, incremental
updates from scanned documents are a feasible alternative.
The available data sources will also influence the feasibility of
scanning data entry. The scanning data entry alternative is most
appropriate when the data do not already exist in digital form but do
exist in document form. Scanning data entry does require a source
document of some kind. If these documents are of poor quality,
scanning data entry can be effective but will require more operator data
cleanup. Scanning data entry is most useful and cost-effective when a
high-quality data source (e.g., maps or air photos) is available.
Feature layers on separate documents reduce processing requirements.
Some applications may be able to use raster data obtained without
document scanning—for example, satellite or airborne scanner data
provided on digital media. Scanning data entry technology offers
software tools that take advantage of commercially available raster
data.
When planning scanning data entry projects, it is very important to
spend time assessing your needs before implementing a solution.
Accuracy requirements and the characteristics of your source
documents will determine the most appropriate hardware and software
package. For example, the need for raster integration may require
additional disk storage or a more powerful CPU. If you work mainly
with air photos and do not have stringent accuracy constraints, you
should consider heads-up digitizing as a scanning data entry option.
If you have good digital geodetic control and a complete series of
plats, raster-to-vector conversion using that control can be an effective
strategy. Used for appropriate applications and implemented

March 1994

Executive Summary
5

G-141/3.36.06

correctly, scanning can be the most cost-effective and efficient method
for capturing your data.

ESRI Solutions
for Scanning
Data Entry

ESRI has used scanning data entry technology for many years. The
ESRI Database Automation Group has adopted scanning data entry as
a preferred methodology. ESRI® software has supported raster data
sets for many years, and a new ESRI software product called
ArcScan™ focuses specifically on providing scanning data entry
software tools. ArcScan is closely integrated with the rest of
ARC/INFO in a single software environment. Thus, all the
functionality of ARC/INFO can be combined with specialized
software for scanning data entry. ESRI can provide turnkey scanning
data entry systems through reseller agreements with industry-leading
hardware vendors. Many other companies have joined ESRI's open
systems approach and offer complementary capabilities that work with
the ARC/INFO® data structures and user interface. ESRI scanning
data entry solutions are affordable, easy to use, and are integrated with
ESRI's advanced GIS data management technology. ESRI scanning
data entry solutions provide a clear and effective alternative for data
automation.

About This
White Paper

Scanning data entry technology offers a variety of tools for
manipulating raster data and for converting raster data to vector data.
A thorough evaluation of project needs and available data sources will
determine the tool or tools that work best for you. The next section
presents information to help you evaluate scanning data entry. ESRI's
rich toolset for scanning data entry and integrated data editing/data
management technology is described in the last section. A glossary
provides definitions for many of the specialized terms used in
scanning data entry.

Environmental Systems Research Institute, Inc.
380 New York Street, Redlands, CA 92373

(909) 793-2853
Fax (909) 793-5953
Telex 910 332 1317

Executive Summary
6

G-141/3.36.06

March 1994

7

G-141/3.36.06

Evaluating Scanning
Data Entry
This section is designed to help you evaluate scanning data
entry. Careful evaluation is a key factor in ensuring your
success. If you take the time to think it through and
evaluate the options and trade-offs carefully, your project
will benefit greatly. Evaluation considerations should
include the data available, the hardware and software, and
the methods or procedures.
Evaluating
Data Sources
for Scanning
Data Entry

Understanding data sources is crucial to understanding how scanning
data entry can be used in your GIS. Scanning solutions are as diverse
as the data used in the GIS applications they support, because
scanning and vectorization requirements are determined by the data.
Two main categories of data are used with scanned data entry projects.
The first is paper or Mylar maps, containing line art, that are scanned
into bi-tonal (black-and-white) raster data sets, or images. Scanned
maps in raster format are usually converted to vectors using raster-tovector conversion programs. The second category of scanned
documents in wide use is aerial photographs, typically black and
white, that are scanned into a grayscale image. Scanned photos are
not conducive to automated raster-to-vector conversion techniques and
are often used in raster format. Scanned photos are often used in
vector conversions as visual background for heads-up digitizing.
While both types of documents lend themselves to scanning data
entry, each has different processing requirements. Scanned data entry
can build a GIS database using any or all of the following data
sources:

Environmental Systems Research Institute, Inc.
380 New York Street, Redlands, CA 92373

(909) 793-2853
Fax (909) 793-5953
Telex 910 332 1317

Evaluating Scanning Data Entry
8

G-141/3.36.06

Line Work Maps

Low-quality 36 x 44 (E format) maps. Ranging from
antique linen maps, to CAD plotter output, to blue lines, to asbuilts, the overwhelming majority of hard-copy maps are low
quality. Quality, in the scanning sense, refers to the quality of the
media itself and the problems the media presents to the scanning
and vectorization process, not to the informational quality of the
data on the document. This is the most common data source
category for scanned data entry applications.
High-quality E format maps. Typically, these are Mylar
maps with multiple data layers. They are more frequently found in
larger, rather than smaller, organizations and public agencies.
Media quality is high—for example, the lines on the media are
clear and crisp and the media will not have extraneous marks or
"noise." This type of data can have clutter, in the form of
annotation or unwanted data layers, which can complicate the
vectorization process.
High-quality, single-layer E format maps. Mylar
separates, such as topographic contours, are often available in this
form. Soil maps, separates from a production map series, and
specially prepared Mylars can fall in this category. Cadastral maps
on Mylar fit in this category if they have only one data layer (i.e.,
only parcels). This type of bi-tonal, line art map data source is the
most amenable to scanning data entry because its high quality
requires the least pre- or post-processing. Hallmarks of highquality documents for scanning are a single data layer, absence of
clutter, presence of registration tics (georeference marks), and
clean, high-contrast media.
11 x 17 (B format) plat maps. Plat maps in this format are
found throughout the United States. Map quality will vary.

March 1994

Evaluating Scanning Data Entry
9

G-141/3.36.06

Line work can contain gaps. In
this case, the gaps are caused by the
line symbology used to represent
intermittent streams. Gaps can
also be caused by low-quality data,
resulting in noisy scanned output.
The ArcScan tracing tool can
"jump" gaps to create continuous
vector output. Gap jumping
parameters can be modified to suit
data requirements.

Photographs and
Digital Imagery

9 x 9 aerial photos. This source data category is widely used
with scanning data entry. Aerial photos are most often black and
white, with some in color when budget allows. The most common
use of scanned photography is to serve as a visual backdrop or
"reality check" to other data. Heads-up digitizing uses scanned
photographic images to guide operator coordinate capture by
"tracing" features from a display screen. Air photos are often the
most up-to-date data source. You can usually scan air photos at a
much lower resolution than line art. This can help compensate for
the greater storage requirements of grayscale images. Visual
display has less stringent resolution requirements than raster-tovector conversion.
Much photography being scanned today is uncorrected.
Uncorrected photography, while relatively inexpensive and useful in
many applications, should be used with care. Even though vector
data derived from uncorrected photography will overlay properly on
its source image, it is only accurate relative to its source data.
However, vector data converted from uncorrected photo images are

Environmental Systems Research Institute, Inc.
380 New York Street, Redlands, CA 92373

(909) 793-2853
Fax (909) 793-5953
Telex 910 332 1317

Evaluating Scanning Data Entry
10

G-141/3.36.06

likely to misregister with data from other sources. In addition,
uncorrected photographic images may not merge well with adjacent
images, and measurements made on uncorrected images will be
incorrect.
Large format photography. Large format photos (e.g.,
24 x 30 or 30 x 30) are available in a variety of scales. 1:24,000
orthophoto quads, usually black and white, have been produced
for large areas (e.g., statewide coverage). Orthocorrection can be
applied photographically or digitally. That is, orthophotos can be
scanned directly and used without need of coordinate correction,
or uncorrected stereophotos can be scanned and then
orthocorrected in digital form. The output of digital orthophotos
can be at any scale appropriate for the resolution of the image.
Other image data from airborne scanners or satellites.
LANDSAT and SPOT images are commercially available (e.g.,
through the ArcDataSM program) and typically offer ten- to thirtyfoot resolution. Airborne scanners linked with GPS receivers can
produce images with much greater resolution (pixel resolution of
less than one foot, for example).

Digital Data
Concepts

This figure illustrates how a
polygon, line, and point feature
would be stored in an x,y
coordinate system (vector) and a
row, column system (raster).

The raster data format is a cellular data format well suited for storing
images or maps. A raster data set is like a carpet of cells overlaying
the map where each cell has a value representing the corresponding
value beneath it in the map. For example, a raster data set of a
scanned map will have pixel values that correspond to the brightness
of the light reflected from the map.
Vector

Raster
columns

y-axis

Polygon
rows
Line
Point
x-axis

March 1994

Evaluating Scanning Data Entry
11

G-141/3.36.06

A raster data set can be bi-tonal, grayscale, or color (often satellite
images are displayed as false color images). Raster data can be
directly useful in a GIS. For example, a scanned air photo can be
used as a backdrop to other infrastructure data, such as roads or
sewers, using ARC/INFO IMAGE INTEGRATOR™ capabilities.
The ARC/INFO GRID™ extension uses the raster data format for
complex spatial analysis. Raster data sets tend to be large—in the
multi-megabyte range—and have special data processing needs.
Raster data file
Header record
Pixel data
.
.
.
end of file

Raster data can be organized in a number of ways depending upon the
particular raster format. Typically, the raster data file contains a
header record that stores information about the data such as the
number of rows and columns, the number of bits per pixel, the color
requirements, and the georeferencing information. Following the
raster header is the actual pixel data for the image. The internal
organization of the raster data is dependent upon the raster format.
Some formats contain only a single band of data, while others contain
multiple bands.
When planning your scanning data entry project, you should give
special attention to input resolution. Input resolution is the number of
pixels per inch in both x and y dimensions of the digital snapshot.
Most scanners allow some control over input resolution. In general,
you should try to reduce data storage requirements by choosing the
lowest resolution that will cleanly capture your data. Some
experimentation will be required as you "fine tune" your scanning data
entry methods.
The input resolutions shown in Table 1 are typical, but not absolute
resolutions. Resolution is expressed in dots per inch (dpi). Doubling
resolution (e.g., from 400 dpi to 800 dpi) can have the effect of
quadrupling data set size (compressed formats will show less increase
in size). Scanning data at a resolution greater than that required by the
source document will only increase data storage requirements with no
appreciable improvement in data quality. Unneeded input resolution
can even create processing problems by exaggerating errors in poorquality data (e.g., additional white noise).

Environmental Systems Research Institute, Inc.
380 New York Street, Redlands, CA 92373

(909) 793-2853
Fax (909) 793-5953
Telex 910 332 1317

Evaluating Scanning Data Entry
12

G-141/3.36.06

TABLE 1—Raster Data Sources
Typical
Data
Example

Typical Use
in Scanning
Data Entry

Typical
Scanner (Input) Raster Data
Resolution
Format

Typical Raster
Data Size

Low quality
36 x 44 inch
(E format) map

As-builts,
blue lines

Raster-to-vector
conversion using
interactive techniques
such as raster cleanup
and line following

400 dpi

6 megabytes

High quality E
format map

Contours on
Mylar
separates

Raster-to-vector
400 to 800 dpi
RLC or GRID biconversion using
(depending on data tonal (compressed)
interactive (multiple
type and quality)
data layers or clutter) or
batch (single data layer)

6 to 30
megabytes

11 x 17 inch
(B format) map

Plat map

Raster-to-vector
conversion

400 to 500 dpi
resolution varies
by source data
quality

RLC or GRID bitonal (compressed)

1 to 4 megabytes

9 x 9 inch aerial
photo (black and
white)

Standard air
photo

Visual backdrop for
other data entry
methods such as
COGO or heads-up
digitizing. Orthophoto production

200 dpi

TIFF
(uncompressed)

4 megabytes

9 x 9 inch aerial
photo (color)

Standard air
photo (color)

Visual backdrop for
200 dpi
other data entry
methods such as
COGO or heads-up
digitizing. Orthophoto
production

TIFF
(uncompressed)

4 megabytes
(8 bit color),
12 megabytes
(24 bit color)

Large format
aerial photo
(30 x 30 inch)

Orthophoto

Visual backdrop for
other data entry
methods such as
COGO or heads-up
digitizing. Orthophoto production

200 dpi

TIFF
(uncompressed)

36 megabytes

Visual backdrop for
heads-up digitizing of
major roads

Purchased in
raster format

Band interleaved,
etc.
(uncompressed)

Varies with type
of data and area of
coverage, 2 to
600 megabytes

Type of
Source Data

Three band satellite EOSAT or
image
SPOT Image
data

March 1994

RLC or GRID bitonal (compressed)

Evaluating Scanning Data Entry
13

G-141/3.36.06

Raster data can be compressed. That is, you can use a data storage
scheme to reduce the amount of disk space required to store the data.
Bi-tonal raster data can be compressed to a greater degree than
grayscale or color data because the cell values of bi-tonal data can be
represented with a single bit—either black or white, on or off, data or
no data. Grayscale and color raster data can also be compressed, but
with lesser compression ratios and at a higher processing cost to
support decompression. When raster data are compressed for storage,
it must be decompressed for display and other operations.
Vector data are in a format that represents map features with the x,y
coordinates of the features. Where a raster data set would represent a
feature by tagging all the cells that overlay the feature, a vector data set
would represent the feature by listing the coordinates of points along
it. Many GIS applications, such as parcel maintenance, demographic
analysis, or vehicle routing, require data in a vector format.
Raster data are unsuitable for these applications because, although the
raster and vector data may look the same displayed on a screen, raster
data have very different characteristics. Scanned raster data are simply
a "digital snapshot" of the source document—a scanned map has
pictorial information and limited connectivity to other data. ARC/INFO
georelational vector data, on the other hand, maintains the internal
spatial relationships of the features it represents and has far more
information than a simple picture. In addition, georelational vector data
have strong connections to other related data, such as tables stored in a
relational database management system (RDBMS).

Environmental Systems Research Institute, Inc.
380 New York Street, Redlands, CA 92373

(909) 793-2853
Fax (909) 793-5953
Telex 910 332 1317

Evaluating Scanning Data Entry
14

G-141/3.36.06

The Georelational Model
Raster data can also be used to yield a vector representation of the
same map data—this process is called raster-to-vector conversion and
is a main topic of this white paper. Only recently, however, has
raster-to-vector conversion technology become affordable, reliable,
and widely available. These advances in both hardware and software
have made scanning data entry a feasible alternative.

Evaluating
Computer Hardware
for Scanning
Data Entry
The Scanner

Scanning data entry has special hardware requirements. These
hardware capabilities support raster data characteristics such as large
data set size and cellular format. A scanning data entry system can
include several hardware components:
Scanners to input hard copy paper maps or photos. Scanners take a
"digital snapshot" of the source material and store this raster data on
disk. Advanced scanners offer on-screen graphical user interface
(GUI) control software to enhance ease of use. Scanners are available
at a variety of output resolutions, support a variety of media sizes, and
can output black-and-white, grayscale, and color images. Scanners
can output scanned images directly to work-station secondary storage
via a high bandwidth interface (e.g., SCSI). Scanners output data in
standard raster data formats such as RLC (for bi-tonal data) or TIFF
(for grayscale data).

March 1994

Evaluating Scanning Data Entry
15

G-141/3.36.06

A scanner is a device with a mechanical document feed that is set
above a row of cameras. The document feed can either be continuous,
as in a drum scanner, or direct feed, where the document feeds in the
front and comes out the back of the device. A light source and a glass
window are between the cameras and the document. The light source
is angled in such a way as to shine through the window and reflect off
the document. There is usually a white background behind the
document in the event that the media is transparent. The reflected light
enters the cameras that focus the image onto a charge-coupled device
(CCD). The CCD is a ceramic board with an imbedded array that
translates the presence or absence of light into digital form. The
output from the CCD is a value (usually from 0 to 255) that indicates a
level of gray. Usually, a grayscale value of zero indicates total
absence of light, while a value of 255 indicates complete light
saturation.
Scanners sense reflected light
values and store the reflectance
values as a digital image.

Scanner Basics
White Background
Media feed
Mechanism

Glass
Window

Map or
Photo

Camera
Light
Source

CCD

Media
Movement

Light
Source

Digital Image Output

Scanner resolution is important. Optical resolution is the ability of
cameras in the scanner to discern data. As a rule of thumb, the optical
resolution of a scanner is expressed in this formula: optical resolution
in dpi = (number of cameras + 1) * 100. Thus, a scanner with three
cameras can offer an optical resolution of 400 dpi. Scanners can also
offer interpolated resolution, in which the data from the CCD are
resampled into smaller pixels. Thus, a scanner with optical resolution
of 400 dpi can also offer inter-polated resolution of 800 dpi. This
method can produce an output image with higher resolution, but not
necessarily with greater accuracy. In general, you should evaluate
scanners for GIS applications using optical resolution because GIS

Environmental Systems Research Institute, Inc.
380 New York Street, Redlands, CA 92373

(909) 793-2853
Fax (909) 793-5953
Telex 910 332 1317

Evaluating Scanning Data Entry
16

G-141/3.36.06

data are too complicated to be adequately captured with interpolated
resolution data.
Positional accuracy is also an important consideration in GIS
applications that require accurate spatial data placement. Positional
inaccuracy can result from media slippage in the scanner document
feed mechanism or from miscalibrated cameras.
Decide on the minimum resolution and accuracy required by your
application. Then find the least expensive scanner that will meet those
needs. A less expensive scanner that provides 200 dpi optical
resolution is adequate for scanning photographs used as visual
backdrops. On the other hand, if your application must scan closely
drawn "tight" contours, you will need a scanner capable of at least 800
dpi optical resolution. And, as when buying any piece of equipment,
you should consider reliability, repair costs, maintenance agreements,
user support, and so on.

The Computer

High-powered computer workstations capable of displaying and
manipulating raster data. Raster data require a bitmapped graphics
monitor for display. Even with bi-tonal data, a color monitor is useful
to support color overlay of vector data. If you are working with
grayscale data, you need at least an 8-bit color display in order to
support a color space that includes both the grayscale image and the
rest of the graphical user interface.
The workstation needs to have the CPU power required to manipulate
and display large amounts of data rapidly—this consideration should
not be underestimated, as slow response in handling large data sets
can adversely affect project productivity. One workstation can be
dedicated as a scanner and/or plotter server if warranted by high
usage, or the workstation can perform other functions.

March 1994

Evaluating Scanning Data Entry
17

G-141/3.36.06

A Plotter

Data Storage

Plotters capable of creating hard-copy output of raster data. Plotter
technologies that support output of raster data include electrostatic, ink
jet, and thermal transfer. Plotters can be color or black and white
(black-and-white plotters usually support grayscale output). Pen
plotters are not appropriate for raster data output. For optimum utility,
plotter software should be capable of combining raster and vector
data.
Secondary storage devices, such as high-speed magnetic disk drives,
or high-volume optical disk drives capable of storing large raster data
files. If you envision an on-line raster database you should be careful
to assess data storage needs carefully. To estimate your raster data
storage requirements, multiply the number of documents you need to
have available on your system at any one time by the raster data set
size for that document type in Table 1. Many applications need to
have only a limited amount of raster data on-line; other applications
want to develop a library of raster data for ad hoc access. Estimation
of your total storage requirements should include the storage needs of
system software, application software, and other types of data (e.g.,
vector data and editing copies of raster data).
A variety of technologies are available to ease the data storage burden.
First, tape archival can be used to simply take unneeded data off the
system. Some sites use a two-tiered approach to data storage in which
more frequently accessed raster data are kept on high-speed magnetic
disks, and less frequently used data are migrated to optical media.
Optical systems can be purchased with software to perform data
migration automatically at off-peak times. Optical systems and
magnetic systems should be transparently usable as mountable file
systems, usually accessible through an open network access standard
such as a Network File System (NFS).

Environmental Systems Research Institute, Inc.
380 New York Street, Redlands, CA 92373

(909) 793-2853
Fax (909) 793-5953
Telex 910 332 1317

Evaluating Scanning Data Entry
18

G-141/3.36.06

Networks

High-speed local area networks (LANs) capable of transferring large
amounts of data at high rates of data throughput. LAN configurations
for distributed processing support scanning data entry by connecting
specialized computing machinery on a high-speed data pathway.
Today's LAN configurations can isolate high bandwidth raster data
traffic from other functions by creating subnets using network
bridges. LAN-based systems are modular and scalable.

Evaluating Software
for Scanning
Data Entry

Scanning data entry has special software requirements. Software
tools are used for managing, manipulating, and displaying raster data,
converting raster data to vector data and providing a graphical user
interface to make scanning data entry easy to do. Some of these tools
operate in batch mode or "behind the scenes" with little user interaction. Their benefit is increased savings of people's time through
increased automation. Others require a higher level of user
interaction—their benefit is the intelligent combination of machine and
human capabilities to attain higher productivity. Software functions
needed to support scanned data entry projects include the following:
Raster data management. The data produced by scanning
data entry must be organized in an orderly way. Georeferenced
data should be organized geographically. Data management
software can optimize storage and retrieval of raster data even
when data volumes are large. For integration with other systems
and scanners, the software should provide raster-to-raster data
conversion. Raster data management software can extract, edit,
and merge a raster data set from a raster database.
Software data compression. Software data compression can
minimize raster data storage requirements. A variety of industrystandard data compression formats are available. Industrystandard data compression formats include RLC and CCITT
Group 3 and Group 4. The CCITT compression standards are
implemented in the TIFF raster data standard. Eight to ten times
data size reduction can be achieved with bi-tonal data. The amount
of actual data reduction will depend on the compression algorithm
used and the complexity of the data. Typically, denser more

March 1994

Evaluating Scanning Data Entry
19

G-141/3.36.06

Example Scanning Data Entry Configurations
Simpler configuration:
5
3
1

SCSI

2

Scanner

4

UNIX
Workstation

SCSI

Magnetic
Disk

More complex configuration:

Raster Plotter

5
3
1

2

4

Scanner
Dedicated
UNIX Workstation

UNIX Server

Local Area Network

5
3
1

2

4

Multipurpose
UNIX Workstation

Optical Storage
Device "Jukebox"

Environmental Systems Research Institute, Inc.
380 New York Street, Redlands, CA 92373

(909) 793-2853
Fax (909) 793-5953
Telex 910 332 1317

Evaluating Scanning Data Entry
20

G-141/3.36.06

complex data cannot be compressed to the extent that sparse data
can be compressed. Scanned bi-tonal maps are more amenable to
compression than scanned grayscale photos. TIFF grayscale data
files can be compressed.
Data that are compressed must usually be decompressed to be
used, requiring processing power and disk space. Some
applications find the overhead imposed by decompression to be
undesirable and choose to keep grayscale data uncompressed for
rapid access. To support this consideration, archived grayscale
data can be compressed, while data being actively used can be kept
in uncompressed format.
Georeference of raster and vector data. This is a basic
requirement for raster data integration. All GIS databases are
ultimately stored and managed in real-world coordinates.
Software should be able to bring raster and vector data into the
same coordinate system by either fitting the raster data to vector
data or vice versa. Georeferencing of data is a requirement for
heads-up digitizing, many editing functions, and any integrated
use of data (e.g., overlay). Raster data must be georeferenced to
be stored as a seamless database of adjacent images.

March 1994

Evaluating Scanning Data Entry
21

G-141/3.36.06

ArcScan georeferencing
menu supports
multiwindow visual
interaction.

Raster data display. Raster data can be displayed with full
control over display symbology and graphic overlay of vector
data. Background values in bi-tonal raster data can be displayed
transparently, thus allowing concurrent display of multiple raster
data sets. Software can alter the display characteristics of
grayscale and color images to suit the needs of the application.
The software provides direct output of raster data to raster-capable
plotters, thus enhancing output speed and reducing hard-copy
processing requirements. The software can merge raster and
vector data in the same hard-copy plot.

Environmental Systems Research Institute, Inc.
380 New York Street, Redlands, CA 92373

(909) 793-2853
Fax (909) 793-5953
Telex 910 332 1317

Evaluating Scanning Data Entry
22

G-141/3.36.06

ArcScan raster editing menu.

Raster data editing. The software provides capability to clean
up raster data with tools that work directly on the raster data
format. Raster editing is a common pre-processing step to rasterto-vector conversion. For example, raster editing software can
remove speckling from raster data—and cleaner raster data are
converted to vector data with less post-processing.
Raster-to-vector data conversion. Software can provide an
array of tools for converting raster data to vector data. The tools
offer the flexibility to adapt to a wide variety of raster data. For
high-quality media, batch vector conversion software may be a
good choice. When source documents are lower quality or have
much clutter, interactive line-following software is often
preferable. When photos are scanned, heads-up digitizing can be
appropriate. Maps that are scanned to capture coordinate
information often have a wealth of feature attribute information as
well. This attribute data can be interactively captured during
scanning data entry procedures if the scanning software tools are
well integrated with other editing functions.
The reason that raster-to-vector conversion requires special
algorithms to overcome problems posed by noise and clutter is that
the raster data format is simply a pictorial representation of the
data. That is, when the conversion software examines the raster
data, it can see only pixels of black and white—it cannot see what
the raster "picture" is supposed to represent. Thus, conversion
software will attempt to make vector lines out of all data it
encounters, even if the "lines" are really the pen strokes that make
up annotation. Conversion software deals with this kind of
problem in many different ways, but human intervention is often
necessary.
GUI interface and software integration. Ease-of-use is
greatly enhanced through a Graphical User Interface (GUI) that
provides point-and-click control of software functionality. For
highest efficiency, the conversion software should be integrated
with other raster and vector editing functions in a common
software environment. By integrating scanning data entry
technology in a common software environment, more software
functions are available and access to more types of data is possible

March 1994

Evaluating Scanning Data Entry
23

G-141/3.36.06

at each processing step. It is easier to learn an integrated software
system that has a consistent and attractive "look and feel."

Evaluating Scanning
Data Entry
Methodologies

Scanning data entry methods vary as widely as the applications in
which they are used. Some of these methods are unique to scanning
data entry; others take advantage of data and software integration to
bring additional functionality scanning data entry projects. Noise
removal and clutter removal are pre-processing methods. Preprocessing prepares the raster data for the raster-to-vector conversion
step.
Noise removal. Scanned maps will have a certain amount of
noise. The lower the map quality, the higher the noise content.
Noise is data that do not have informational content. For example,
a common type of noise is tiny spots called speckles that are an
artifact of the scanning process. The speckles are unwanted in the
final vector output. Various methods can be used to remove noise
from the image so that vectorization can proceed. These methods
are collectively called noise removal.
Clutter removal. Even a high-quality map may have unwanted
data on it, such as annotation. Often maps show more than one
data type or layer. For example, a map might show parcel
boundaries, easements, and street names. When only the parcels
are to be vectorized, the easements and street names are clutter.
Scanning data entry has developed methods for clutter removal,
such as raster editing tools that "white out" indicated areas.
Software filters that remove data below a given threshold size can
be useful for removing the relatively small lines that make up
lettering (annotation).
Post-processing. Once the raster-to-vector conversion has
been performed, the output vector data can be post-processed.
Post-processing is usually accomplished with vector editing
software to correct output vector data. The more closely the postprocessing software is integrated with the vectorization software,
the more efficient the entire vectorization process becomes.

Environmental Systems Research Institute, Inc.
380 New York Street, Redlands, CA 92373

(909) 793-2853
Fax (909) 793-5953
Telex 910 332 1317

Evaluating Scanning Data Entry
24

G-141/3.36.06

Proper scanning procedures, such as correct choice of resolution,
can greatly reduce post-processing. Naturally, the less noise and
clutter in the source documents, the less pre- and post-processing
required.
Most data will require some amount of pre-processing, postprocessing, or both. For this reason, most scanning data entry
projects will require human judgment and intervention in the
conversion process. This does not make scanning data entry hard
to use, but it does reduce the amount of unattended automation and
hence the amount of potential time savings possible.

Select
starting
point with
mouse

tracing arrow

Trace
along
indicated
direction

Change
tracing
direction

The ArcScan Tracing Tool is
controlled by the mouse buttons.

Line-following interactive raster-to-vector data
conversion. Line following is a vectorization method that is
well suited to low-quality or cluttered data. Line following is
easily adaptable to a wide variety of different types of maps (as
opposed to a single map series). Line following software
automatically follows lines in the raster data set, and outputs
vector data as it goes. When the line-following software is
uncertain which way to go, such as at a line intersection, it pauses
and waits for the operator to indicate the proper line to follow.
Line-following software can create vectors from troublesome data
such as lines intersected by annotation. When digitizing polygon
data (e.g., closed loops) ESRI line-following software keeps track
of previously vectorized paths and removes them from
consideration, thus making the whole process more efficient.
Line following is highly interactive and requires a human operator.
It is faster than table digitizing but its primary benefit is the high
quality and consistency of the data it outputs. When line
following is closely integrated with other vector editing software,
post-processing functions can be made immediately available to
the operator, providing a total vectorization environment in which
data entry can be completely finished.
Batch raster-to-vector data conversion. Batch
vectorization proceeds without operator intervention. Batch
vectorization will vectorize all the data in the input raster data set.
Batch vectorization provides better results with high-quality,

March 1994

Evaluating Scanning Data Entry
25

G-141/3.36.06

uncluttered data. Batch vectorization will require post-processing.
Since batch vectorization proceeds without operator intervention
and vectorizes all data input, low-quality, cluttered data can reduce
its efficiency by requiring high levels of post processing.
Typically, the user tests a variety of vectorization parameters to
find the combination of parameters best suited to the data, and then
initiates batch vectorization using those parameters. The major
advantage of batch vectorization is that once parameters are set,
vectorization can proceed unattended and produce an output vector
data set in much less time than other methods. Since all the maps
of a single map series tend to be alike and can use the same
parameters, batch vectorization is well suited to projects that scan a
large number of similar maps.
Both interactive vectorizers and batch vectorizers can deal with a
variety of cartographic problems such as dotted line symbolism.
The choice of which vectorizer to use is mostly data dependent.
Having both methods available broadens the array of available
tools. Vectorizers of both kinds work only with bi-tonal raster
data—they cannot presently be used with grayscale or color raster
data.
Feature attribute capture. Line followers and batch
vectorizers both output lines in vector format. These methods are
good for automating the line symbology on maps. Maps can also
contain other data such as point symbols and annotation. The
symbols and annotation are often attribute data associated with a
point, line, or area feature on the map and may be of value to the
GIS database. The informational content of the point symbology
and annotation can be captured by methods that combine heads-up
digitizing with data editing capabilities. Advanced scanning data
entry software will allow the user to point at symbology or
annotation and attach its information to a feature. In a single
integrated software environment, the user can take advantage of
imaging capabilities, vector editing capabilities, and forms entry
capabilities, all within an application tailored to capturing data
from a specific map series. Operator intervention and some key
entry is required, but this method can be an efficient way to

Environmental Systems Research Institute, Inc.
380 New York Street, Redlands, CA 92373

(909) 793-2853
Fax (909) 793-5953
Telex 910 332 1317

Evaluating Scanning Data Entry
26

G-141/3.36.06

capture attribute information along with coordinate information in
one handling of the source document.
Heads-up digitizing. Here, the user captures coordinate
information by tracing features directly from data displayed on the
screen. A digitizing table is not used by all. Hence the term
"heads-up"—an allusion to the heads-up instrument display
technology developed for aircraft pilots.
Heads-up digitizing is a scanning data entry method that is
commonly used when coordinate accuracy need not be at
engineering levels. As the operator digitizes from the screen,
output accuracy is determined by the accuracy of the source
document, the resolution at which it was scanned, the resolution at
which it is displayed, the resolution of the screen itself, and the
skill of the operator. Usually greater accuracy can be attained by
other methods. Heads-up digitizing is usually performed on
georeferenced raster data sets.
Heads-up digitizing is most commonly performed using scanned
images and thus can be a way to quickly capture the most current
information. Vectorizing methods are the most common way to
capture information from scanned maps, although heads-up
digitizing can be used with scanned maps.
Orthophoto production. Scanned stereopair images can be
digitally orthocorrected to produce a digital orthophoto. Optically
corrected orthophotos can be scanned directly into digital
orthophotos. A digital orthophoto can be plotted at virtually any
scale. For example, a series of 30 x 30 orthophotos can be
scanned, merged into a common database, and reproduced on a
plotter without reference to the original size, coverage, and format
of the hard-copy photos. While output scale can be changed
freely, care should be taken not to "blow up" the image too much
or the output image will appear blocky. Orthophotos can also be
plotted with overlays of vector data.
Combining scanned data with other data. The data
produced through scanning data entry may not be the only data of

March 1994

Evaluating Scanning Data Entry
27

G-141/3.36.06

interest. For example, scanned and vectorized data can be fit into
a geodetic control network created by coordinate geometry
(COGO) data entry. This technique can be used to localize, or
bound, error in scanned plats. Vectorized data can be combined
with purchased or table-digitized vector data. Accurate
georeferencing is very important when using scanned data with
other data.
Scanned data also have attributes. Adding attributes to scanned
data is usually necessary. For example, scanned contours need to
be tagged with their elevation values for use in digital terrain
projects. If scanning data entry software is integrated with other
GIS software tools, any or all of those tools can be made part of
the scanning data entry and data automation process.
Incremental data automation. Some organizations have
adopted an incremental approach to database creation. Incremental
methods include in-house staff digitizing on a time-available basis,
addition of digital data from other sources, such as CAD data from
land developers, or conversion of other digital data such as legal
descriptions. Incremental database generation approaches can
work. But, since any GIS must have data in order to be effective,
it can be wise to input some data immediately so as to demonstrate
immediate GIS benefits.
Scanning data entry supports incremental database development.
With scanning data entry, a raster database can be produced
quickly by scanning maps or air photos and georeferencing the
scanned images to real-world coordinates. The georeferenced
raster data can provide immediate benefit as a visual backdrop to
other data and as a data source for vector conversion. The raster
database can provide complete seamless coverage for an agency's
entire area of responsibility (e.g., a city or a county), and vector
conversion can proceed incrementally, on an as-needed or highestneed basis.

Environmental Systems Research Institute, Inc.
380 New York Street, Redlands, CA 92373

(909) 793-2853
Fax (909) 793-5953
Telex 910 332 1317

Evaluating Scanning Data Entry
28

ESRI Scanning Data
Entry Procedures

G-141/3.36.06

ESRI's experience as a user of scanning data entry technology enables
us to share a very practical viewpoint. The ESRI Database Development Group has used scanning data entry for many projects. The
Digital Chart of the World project, performed by ESRI as the prime
contractor to the Defense Mapping Agency, used scanning data entry
to develop a 2.8 gigabyte vector database from over 2,800 source
documents. The ESRI Database Development Group has wide
experience with scanning data entry—and much of this experience is
reflected in this chapter.
The group adapts scanning data entry technology and methods for
each project, as determined by the project goals and available source
documents. Even so, a standard processing sequence has evolved.
This processing sequence is shown in the following flowchart. Note
that some sort of pre- and/or post-processing is an assumed
requirement. Scanning data entry can reduce data automation time
requirements, but it will not eliminate them.

March 1994

Evaluating Scanning Data Entry
29

G-141/3.36.06

Scanning Data Entry Procedures

Evaluate
Project
Goals

Evaluate
Source
Documents

• Determine scan resolution
• Determine proposed
processing flow

Test and set scanner
parameters

• Find or draft georeference marks
(tics) on source documents
• Hand prep source documents

Scan
Documents

Pre-process (raster
edit) raster data

Vectorize

Heads-up digitize
Air photos

Maps

Post process
(vector cleanup)
coverage data

Environmental Systems Research Institute, Inc.
380 New York Street, Redlands, CA 92373

(909) 793-2853
Fax (909) 793-5953
Telex 910 332 1317

Evaluating Scanning Data Entry
30

G-141/3.36.06

March 1994

31

G-141/3.36.06

ESRI Scanning Data
Entry Solutions—
A GIS Focus
ESRI's experience as a manufacturer and user of GIS
technology enables us to view scanning data entry from
the special perspective of GIS needs and requirements.
GIS requirements are different from CAD and engineering requirements—ESRI solutions are based on GIS
requirements. For example, GIS data typically depict
irregular features from the real world, not the right
angles and straight lines of CAD drawings.
ESRI provides a wide range of products and services that can support
GIS scanning data entry projects. ESRI brings its experience in GIS
and scanning data entry together in a software package called
ArcScan. ArcScan is specifically designed to support GIS scanning
data entry projects. ArcScan is a fully integrated extension to
ARC/INFO, and takes advantage of ARC/INFO's complete GIS
functionality.
ARC/INFO itself provides much ancillary capability to scanning data
entry projects, including vector data editing and management.
ARC/INFO provides additional raster data support through its bundled
IMAGE INTEGRATOR capabilities. ARC/INFO software extensions
for surface modeling, raster data modeling, network modeling, and
coordinate geometry can all play a part in scanning data entry projects
because these capabilities are available in the common ARC/INFO
software environment.
ArcScan, ARC/INFO, ArcView®, ArcCAD®, third-party software
integrated with ESRI software, the ArcData program, ESRI services,

Environmental Systems Research Institute, Inc.
380 New York Street, Redlands, CA 92373

(909) 793-2853
Fax (909) 793-5953
Telex 910 332 1317

ESRI Scanning Data Entry Solutions—A GIS Focus
32

G-141/3.36.06

and the ESRI hardware reseller program can be flexibly matched to the
exact requirements of your scanning data entry project and your GIS
implementation.

ArcScan

ArcScan Capabilities

ArcScan is a set of software tools that support data automation using
scan digitizing. These tools permit GIS applications to automate
vector databases using scanned raster data sets as input. ArcScan is
an extension to ARC/INFO and is fully integrated with the ARC/INFO
software environment. The ArcTools™ ArcScan menu system
provides an easy-to-use interface for raster editing and interactive
vectorization. ArcScan includes a Users Guide and Command
Reference that describe ArcScan capabilities to users.
ArcScan users can create ARC/INFO coverages by extracting line
features from scanned monochrome document images. This is done
using interactive, automated line-following software within
ARCEDIT. Examples of linear features that can be extracted and
added to a coverage include street lines, utility lines, contour lines,
parcel boundaries, and soil polygon boundaries. This technique for
digitizing features is simpler, more accurate, and often faster than
traditional manual and heads-up digitizing.
ArcScan provides tools for editing monochrome, grayscale, and
pseudo-color single-band imagery within ARCEDIT.
ArcScan provides a powerful and efficient set of tools in ARC/INFO
for importing, correcting, editing, plotting, and exporting scanned
raster images. ArcScan supports industry-standard raster data formats
and can accept data from many types of scanners.

ArcScan Components

Raster Database
Construction Tools

ArcScan functional components include raster database construction
tools, raster pre-processing tools, integrated raster-to-vector editing
tools, and an interactive raster-to-vector conversion tool. ArcScan
soft-copy and hard-copy raster display is provided using standard
ARC/INFO display functionality. ArcScan tools work with the highly
efficient ARC/INFO grid raster data format.
Scanned raster images in a variety of standard formats (e.g., TIFF,
RLC, SunRASTER) and compressions (e.g., Run Length
Compressed, CCITT Group III, CCITT Group IV) can be converted

March 1994

ESRI Scanning Data Entry Solutions—A GIS Focus
33

G-141/3.36.06

to and from compressed grids using the IMAGEGRID and
GRIDIMAGE conversion tools. The GRIDMERGE tool can be used
to build a raster database from georeferenced input grid raster data
sets. The ArcScan conversion tools transfer runs of data directly,
rapidly converting large scanned monochrome documents.

Geometric Correction
and Noise
Removal Tools

The ArcScan raster pre-processing tools prepare raster data sets for
additional processing. A set of geometric correction tools can be used
to correct orientation errors during scanning, distortions in the source
document, and georeferencing. These tools perform the following
operations:
Rotate a grid by a multiple of 90 degrees. Commonly used when
documents are scanned sideways.
Flip the contents of a grid from top to bottom. Commonly used
when documents are scanned upside down.
Mirror the contents of a grid from left to right. Commonly used
with translucent documents that are scanned wrong side up.
Correct a skewed document by converting a user-specified
parallelogram on the input document to a rectangle on the output
document. A common distortion encountered in scan digitizing is
a skew caused by paper feed that is not perfectly aligned.
Apply a warping transformation to georeference a grid to realworld coordinates.
The following noise cleanup tools can be applied to either the entire
image or a selected image area:
Remove specks of black noise from a scanned image. Most
scanned documents will show speckling to varying degrees.
Apply a majority rule filter to a scanned image. Commonly used
to correct dropout in noisy scanned lines.

ArcScan raster data
management menu.

Environmental Systems Research Institute, Inc.
380 New York Street, Redlands, CA 92373

(909) 793-2853
Fax (909) 793-5953
Telex 910 332 1317

ESRI Scanning Data Entry Solutions—A GIS Focus
34

G-141/3.36.06

Integrated Raster–
Vector Editing Tools

Grids can be edited in conjunction with coverages. The ARC/INFO
software environment supports editing monochrome, grayscale, and
pseudo-color raster data. Users can edit multiple grids during an
ARCEDIT session. ARCEDIT supports full pan and zoom display in
map coordinate space of both raster and vector data. A multilevel
undo capability provides the user with a "safety net" during raster
editing. The user can concurrently edit both raster and vector data.
The editing tools include
Filling Tools: Fill the interiors of user-defined boxes, circles,
and polygons. Fill a connected entity of pixels by pointing to any
pixel in the entity.
Drawing Tools: Rasterize the boundaries of user-defined
boxes, circles, and polygon.
Pixel Editing Tools: Set and query the value of individual
pixels by pointing to the screen.
Brush Tool: Change the value of pixels in the grid by dragging
a brush on the screen.
Rasterization Tools: Rasterize the selected set of arcs into the
edit grid.
Selection Tools: Select all cells within a box, circle, or
polygon.
Geometric Operations: Move, rotate, flip, and mirror the
selected region.
Filtering Operations: Despeckle, smooth, or enhance the
selected region.
Georeferencing Tools: These tools allow a user to
interactively position and rescale the edit grid in map coordinate
space, deskew the edit grid, and warp the edit grid using a link
coverage created in ARCEDIT. Georeferencing is accomplished

March 1994

ESRI Scanning Data Entry Solutions—A GIS Focus
35

G-141/3.36.06

by identifying common points in the raster data set and in realworld coordinates.
The mouse is used to register the
image to real-world coordinates.

Interactive
Vectorization Tools

With ArcScan you can extract the centerlines of linear features from a
raster document with optimized user intervention. Interactive rasterto-vector conversion using an automated line-following, or linetracing, tool is especially useful for selective raster-to-vector
conversion from a raster data set with multiple data layers. The
ArcScan line tracer, because of its high degree of user control, can
also vectorize complex and difficult data. With the line tracer tool you
can efficiently produce high-quality vector output. The trace tool
performs automatic intersection straightening and automatic line
generalization based on user parameters.

The ArcScan Tracing Tool works
in the ARCEDIT environment.

The trace tool snaps to the center of a raster line. Tracing begins at
that point, stopping at junctions to obtain user input. The user
interacts with the tracer using the mouse or keyboard, and controls the
direction taken by the tracer at the junction. Tracer features include the
ability to jump gaps, the ability to snap to the center of a heavy raster

Environmental Systems Research Institute, Inc.
380 New York Street, Redlands, CA 92373

(909) 793-2853
Fax (909) 793-5953
Telex 910 332 1317

ESRI Scanning Data Entry Solutions—A GIS Focus
36

G-141/3.36.06

line, and smart retrace. Built-in junction memory prevents retrace of
explored paths, offering improved line tracing efficiency. Line
following can be done with user interaction, or in fully automatic
mode, all connected line work within a defined area can be vectorized
without operator intervention. The line tracer can work from bi-tonal
ARC/INFO grid or RLC raster data.
ArcScan provides automatic
cleanup of line intersections
guided by user set parameters.

Because the line tracer is built into ARCEDIT, you can interleave
manual digitizing with line following, enabling more productive
processing of noisy data. You can use a menu interface for heads-up
digitizing to tag features with attribute data shown in the raster
document. The line trace tool works with the multiple windowing
capability of ARCEDIT. ArcScan automatically moves a close-up
view of the tracing activity, following the tracing activity even as it
moves out of view.

ARC/INFO

ARC/INFO is a full-feature GIS capable of meeting the complex
requirements of a wide variety of GIS applications. All the
capabilities of ARC/INFO can be applied to scanning data entry
projects. One of the most important features of ARC/INFO for
scanning data entry projects is ARC/INFO software's user interface
environment, ArcTools. ArcTools organizes ARC/INFO software's
thousands of GIS tools and provides a single look and feel to
ARC/INFO functionality, including the ArcScan extension.
ARC/INFO capabilities can be fully customized using the ARC Macro
Language—processing procedures for scanning data entry can be
tailored to the specific needs of the GIS application.

March 1994

ESRI Scanning Data Entry Solutions—A GIS Focus
37

G-141/3.36.06

The integrated ARC/INFO software environment makes all
ARC/INFO functionality immediately available. This offers a
scanning data entry project the ability to take advantage of ARCEDIT
vector and attribute editing capability. This ARC/INFO data
automation functionality can be added to the techniques specific to
scanning data entry. The final result of scanning data entry is a
topologically correct ARC/INFO georelational database that can
support GIS analysis and advanced display.
An important feature of ARC/INFO software's integrated environment
is that all editing and data entry functions can use the ArcStorm
database manager. ArcStorm (ARC/INFO STORage Manager)
provides feature-level access to seamless spatial databases. ArcStorm
supports production-level data entry by multiple users.
Because scanning data entry is implemented in the ARCEDIT
environment, advanced spatial editing features can be utilized. These
advanced features include editing of complex and user-defined
features and interactive topology creation. In short, once the edit
session is over, no further processing is required.
The IMAGE INTEGRATOR functionality included with ARC/INFO
provides image conversion, management, and display capabilities for
a wide variety of image data (see Table 2), including industry-standard
formats used by most scanner vendors. IMAGE INTEGRATOR
brings image handling capability to scanning data entry projects and
provides such benefits as concurrent raster and vector display for
heads-up digitizing and raster plotter support for hard-copy
production. These images can be easily converted into the userselected format of preference, geo-referenced and kept in an
ARC/INFO Image Catalog as a seamless raster database.

Environmental Systems Research Institute, Inc.
380 New York Street, Redlands, CA 92373

(909) 793-2853
Fax (909) 793-5953
Telex 910 332 1317

ESRI Scanning Data Entry Solutions—A GIS Focus
38

G-141/3.36.06

TABLE 2—Summary of Image File Formats
Supported by IMAGE INTEGRATOR
File Format

Typical Data Type

Bits (per pixel
per band)

TIFF
(Tag Image File
Format)

scanned image data,
graphic image data
single and multiband

1 to 8, 16, 2 4, 3 2
(8 only for a
multiband)

uncompressed or
compressed
(CCITT Group 3
and Group 4, LZW
and PackBits)

no suffix is required

Sun Rasterfile
(a bitmap format)

scanned image data,
graphic image data

1, 8, 24, 3 2

compressed or
uncompressed

no suffix is required

Run-Length
Compressed (RLC)

scanned image data

compressed 1

image.rlc

ERDAS

categorical and
continuous map data

4, 8, 1 6

uncompressed

image.gis

multiband image data

4, 8, 1 6

uncompressed

image.lan

IMAGINE

4, 8, 1 6

uncompressed

image.img

Band Interleaved by
Line (BIL) with an
ASCII header file

categorical and
continuous map data,
multiband image data

1, 4, 8,
1 6, 3 2

uncompressed

image.bil

Band Interleaved by
Pixel (BIP) with an
ASCII header file

categorical and
continuous map data,
multiband image data

1, 4, 8,
1 6, 3 2

uncompressed

image.bip

Band Sequential
Raster (BSQ) with an
ASCII header file

categorical and
continuous map data,
multiband image data

1, 4, 8,
1 6, 3 2

uncompressed

image.bsq

GRASS
(3.0 formats)

categorical and
continuous map data,
multiband image data

32

compressed or
uncompressed

use GRASS naming
conventions (e.g.,
location:mapset:layer)

GRASS
(4.0 formats,
read only)

categorical and
continuous map data,
multiband image data

32

compressed or
uncompressed

use GRASS naming
conventions (e.g.,
location:mapset:layer)

GRID
(an ESRI format)

categorical and
continuous map data

32

compressed

no suffix, grids are
directories

Arc Digitized Raster
Graphics (ADRG).
Read only.

scanned image

8, 3 bands

uncompressed

files organized by
directory

March 1994

Compression

Image Naming
Conventions

ESRI Scanning Data Entry Solutions—A GIS Focus
39

G-141/3.36.06

Importantly, the IMAGE INTEGRATOR can also display images that
do not have the inherent geographic component that maps and satellite
images do. This type of image can also be scanned from input
sources such as photographs, textual documents, and video input.
This type of image cannot be georeferenced and is commonly used as
a pictorial attribute of a coverage feature. A DBMS capable of storing
Binary Large Objects (BLOBs) can be used to manage this type of
image. ARC/INFO can access BLOB data in a DBMS.
Video image attribute, stored as a
BLOB in an external DBMS
table, can be displayed with the
IMAGE INTEGRATOR
command IMAGEVIEW.

Scanned document information
can also be stored as a BLOB and
displayed using the
IMAGEVIEW capability.

Environmental Systems Research Institute, Inc.
380 New York Street, Redlands, CA 92373

(909) 793-2853
Fax (909) 793-5953
Telex 910 332 1317

ESRI Scanning Data Entry Solutions—A GIS Focus
40

G-141/3.36.06

ARC/INFO Extensions

As required by your GIS application, the ARC/INFO software
extensions for surface modeling, raster data modeling, network
modeling, and coordinate geometry can be used with scanning data
entry. For example, scanned plats can be georeferenced to ground
control obtained through the ARC/INFO coordinate geometry
extension, COGO. Scanned topographic contours can be used as
input to surface modeling applications performed with the TIN
extension. Scanned data can be draped on three-dimensional views of
topography. Draping scanned topographic maps, aerial photographs,
or other types of raster files or images over a terrain model is an
extremely powerful and useful method of visualizing a surface. The
GRID extension can use scanned data in raster data modeling and can
provide map projection capability for raster data. The NETWORK
extension, used for analysis and modeling of geographic networks,
can use scanned maps as a raster backdrop. The vector data produced
through scanned data entry can be used by NETWORK for pavement
management, water and sewer infrastructure management, electrical
utilities management, and many other GIS applications. The
ARC/INFO scanning data entry solution feeds data directly into the
end user applications that make GIS pay off. Because ARC/INFO
offers an integrated software environment, the GIS functionality
offered by the ARC/INFO extensions can be applied to scanning data
entry projects.
Software packages from other companies support the ARC/INFO
raster and vector data formats. These software packages can extend
the ARC/INFO scanning data entry capability. For example, image
processing and batch vectorization are available.

ArcCAD

ArcCAD is ESRI's GIS solution for the AutoCAD environment.
ArcCAD joins GIS and CAD functionality and integrates CAD and
GIS vector data. Raster data can also play a role in combined CAD–
GIS projects. Several software packages are available that allow the
display of scanned paper drawings and other types of raster images as
backdrops to AutoCAD files and/or ArcCAD coverages. A raster
image can be used either as a visual reference for the simultaneously
displayed vector data or as a convenient method to perform heads-up
digitizing. Video images captured with a wide variety of frame

March 1994

ESRI Scanning Data Entry Solutions—A GIS Focus
41

G-141/3.36.06

grabbers can also be displayed during an ArcCAD session. Scanning,
raster-to-vector conversion, and raster editing are also available.

ArcView

ArcView, ESRI's desktop GIS display and query software, is capable
of viewing all the image formats supported by ARC/INFO. ArcView
can play a role in a scanning data entry project as a "quick look" tool
to view and verify scanned data. ArcView can display raster and
vector data simultaneously and can provide quick output of raster
graphics in industry-standard graphic formats such as PostScript.

ESRI Services

ESRI provides a full range of services including training, database
automation, application development, and on-site technology transfer.
Since 1969, ESRI has supported hundreds of organizations
throughout the world in the design, development, and implementation
of GIS. ESRI's services support the complete GIS life cycle,
including implementation planning, system integration, database
development, application development, and system operation. ESRI
is unique within the GIS industry in its ability to provide such
comprehensive services in combination with a complete set of leading
GIS software.
Working with the leading hardware vendors, ESRI has successfully
provided turnkey geographic information systems to hundreds of
clients.
ESRI offers new ArcScan users an on-site ArcScan Start-up Support
Package. This package is two days of consulting and technical
training support by an ESRI technical analyst to help the new ArcScan
user implement this technology. The goal of the support package is to
transfer the skills necessary to begin using the ArcScan software in a
production environment. The subjects covered are document
preparation, system initialization, ArcScan software usage, quality
assurance techniques, and data structure considerations for scanning.
The ArcScan Start-up Support Package is provided at your site using
your equipment.

Environmental Systems Research Institute, Inc.
380 New York Street, Redlands, CA 92373

(909) 793-2853
Fax (909) 793-5953
Telex 910 332 1317

ESRI Scanning Data Entry Solutions—A GIS Focus
42

G-141/3.36.06

ArcData

Supported Devices

ArcData is ESRI's program for providing published spatial and
geographically related digital data in ARC/INFO supported formats.
Existing off-the-shelf vector and raster databases can complement
scanning projects. The ArcData program includes satellite and other
imagery. Through the ArcData program, leading data vendors such as
EOSAT, Spot Image, and Hughes STX can provide imagery for
locations worldwide that can be used in scanning data entry projects.
As mentioned previously, ARC/INFO supports industry-standard
raster data formats. For a scanner to be usable with ARC/INFO it
must output raster data in one of these standard formats, preferably
TIFF or RLC. Other factors, such as direct output to the UNIX file
system and UNIX-based controller software also bear on scanner
ease-of-use. ArcScan has been tested with scanners from leading
manufacturers. The section on evaluating scanning data entry
provides more information on scanner features that are important.
ESRI's machine-independent philosophy allows ARC/INFO users to
take advantage of new hardware developments as they occur.
ESRI defines level of support for peripheral devices, such as scanners
and plotters, using a numerical classification system. The ARC/INFO
Users Guide, Supported Devices, UNIX Workstations (Rev. 7.0)
provides classifications for specific scanner, plotters, and other
devices. The classification categories are outlined below.

Classes of Supported
Devices
Class 1: Fully
Supported,
In-House at ESRI

There are five classes of support for supported devices:

Class 1 devices have been tested at ESRI, run successfully with
ARC/INFO, and have an interface (driver, interface file, etc.)
provided with ARC/INFO Rev. 6.1.2. Any problems that occur with
these devices can be tested on-site because the device must remain on
ESRI premises for it to remain a Class 1 supported device. This is the
highest level of support.

March 1994

ESRI Scanning Data Entry Solutions—A GIS Focus
43

G-141/3.36.06

Class 2: Conditionally
Supported, Not
In-House at ESRI

Class 2 devices have been tested, run successfully with ARC/INFO,
and an interface is provided with ARC/INFO Rev. 6.1.2. However,
they are conditionally supported since they are not kept on ESRI
premises. Therefore, any testing or troubleshooting of problems
encountered cannot be conclusive. ESRI does not guarantee the
interface unless the device can be present at ESRI for troubleshooting
and software repair. Some Class 2 devices are no longer
manufactured or available for testing.

Class 3: Limited
Support, Either Has
Not Been Tested at
ESRI, or Has Serious
Limitations

Class 3 devices have limited support for one of the following reasons:
They have not been tested at ESRI but are assumed to work with
ARC/INFO software.
They have been tested and found to have serious limitations or
restrictions.
Any known limitations or restrictions and their work-arounds (if
they exist) are noted for each device.
An interface is provided for these devices. Users are expected to
know how to connect these devices to their computer, as well as
configure the device and its computer connections. ESRI does not
guarantee the interface of these devices.

Class 4: Not
Supported, But
Interface Is Possible

No interface is provided or available for the device. However, ESRI
believes that it may be possible to build an interface. ESRI takes no
responsibility for connecting the device to the computer, or for
configuring the device and its computer connections. ESRI also takes
no responsibility for developing the interface (writing of interface
files, etc.).
One important subgroup of Class 4 devices includes devices that
emulate supported devices. In many cases, these devices work
extremely well. However, since ARC/INFO is designed for use with
the supported device, there may be some operations that an emulating
device does not perform well. Any known restrictions or limitations
and their work-arounds (if they exist) are noted for each device.

Environmental Systems Research Institute, Inc.
380 New York Street, Redlands, CA 92373

(909) 793-2853
Fax (909) 793-5953
Telex 910 332 1317

ESRI Scanning Data Entry Solutions—A GIS Focus
44

G-141/3.36.06

Class 5:

Unknown

Class 6: Not
Supported

Citations

The status or ability to interface this device is unknown at the time of
this publishing. New devices as well as devices not included in this
document that have not yet been tested fall into this category.
These devices will not work with ARC/INFO and therefore cannot be
supported. In most cases, unsuccessful attempts have been made to
interface these devices.

Litton, Adrien L. "Automated Data Capture: The Scanning
Solution," Proceedings, 1993 ARC/INFO User Conference
ARC/INFO: GIS Today and Tomorrow, ESRI White Paper
Series, September 1992
ArcCAD, The Integration of CAD and GIS, ESRI White Paper
Series, April 1993
Supported Devices Guide, UNIX Workstations
(ARC/INFO Rev. 7.0)
WorkStation ARC/INFO Technical Guide to Hardware Options
ARC/INFO Users Guide, Cell-based Modeling with GRID
A Wide Range of High-Quality Support for All Your GIS Needs,
ESRI Brochure

March 1994

45

G-141/3.36.06

Glossary
Definitions of key terms that will help
you to understand the concepts discussed
in this document.
ARCEDIT

ARCEDIT is the ARC/INFO environment for editing coverage
coordinate data and descriptive data. Its sophisticated graphic and
editing capabilities provide the tools necessary for accurate data entry
and manipulation. These capabilities are important for creating and
maintaining geographic databases.

ARCPLOT

ARCPLOT is the ARC/INFO environment for providing cartographic
tools for all of your ARC/INFO mapping needs, including full
cartographic design, display, and production capabilities.

ArcScan

ArcScan is the ARC/INFO extension that provides capabilities to
support scanning data entry. ArcScan is closely integrated with other
ARC/INFO functionality, particularly IMAGE INTEGRATOR and
ARCEDIT.

ArcTools

ArcTools is a graphical user interface (GUI) environment provided
with ARC/INFO. ArcTools provides a consistent menu interface to
ARC/INFO and its subsystems and extensions.

attribute

An attribute is a characteristic of a map feature described by numbers
or characters, typically stored in tabular format, and linked to the
feature by a user-assigned identifier. For example, attributes of a
well, represented by a point, might include depth, pump type, and

Environmental Systems Research Institute, Inc.
380 New York Street, Redlands, CA 92373

(909) 793-2853
Fax (909) 793-5953
Telex 910 332 1317

Glossary
46

G-141/3.36.06

owner. Feature attribute information is often present in source
documents as symbology or annotation. For example, a plat map may
show Parcel Identification Numbers (PINs).

attribute table

Attribute tables are tabular, flat, or relational files directly associated to
the spatial data and form the "relational" half of the georelational data
structure. ARC/INFO functions maintain the integration of spatial and
attribute data in feature attribute tables. Additional attribute
information may be kept in external attribute tables, perhaps
maintained as tables in an RDBMS.

bandwidth

Bandwidth is a way of expressing how much data can be transmitted
across a communications medium at any one time. The higher the
bandwidth, the more activity the communications channel can support.
For example, local area networks have higher bandwidth than serial
communications links. Bandwidth is often measured in megabytes
per second.

bit













Bi-tonal, as applied to raster data sets, means the raster data have only
two possible values. Tonal quality is the brightness value, therefore
bi-tonal data have only two values, black and white.

black noise

Black noise is pixels with black values where the original information
content had white values. This often has the appearance of speckling,
tiny black spots on a white background. Noise is data in an communication channel that is random or has no informational content.
Noise is usually caused by low data quality and is unwanted because
extra pre- or post-processing can be required to remove it. Black
noise is the addition of data where none should exist. See white
noise.





• •


• •
• •

• •

• •















bi-tonal










The smallest unit of information that can be stored and processed in a
computer. A bit has two possible values, 0 or 1, which can be
interpreted as BLACK/WHITE or ON/OFF. Bi-tonal data can be
compressed into images that represent cell values with a single bit.





















• •

• •





















Black noise, or addition of data.

March 1994

Glossary
47

G-141/3.36.06

BLOB

Binary Large Object. A term often used with database management
systems. Any large data set handled as binary data; often BLOBs are
raster image data.

categorical data

Categorical data consist of values representing discrete categories,
such as soil or vegetation type. Also referred to as nominal data.

CCD

Charge-coupled device. A CCD is the electronic instrument used in
scanners to sense brightness values. CCDs are usually capable of
distinguishing and outputting grayscale data that have a maximum of
256 levels of gray.

cell

The basic element of spatial information in a grid data set. Cells are
always square. A group of cells forms a grid.
Y-axis
Cell size

}

Upper left corner
Rows

1
2
3
4
5
6
7

Value
1
2
3
4
5
6
7

Count
8
11
4
12
5
30
10

Cover-Type
W Pine
D Fir
Mixed
Grass
Water
Paved
Agriculture

Columns

(0,0)

cell based

X-axis

See raster.

Environmental Systems Research Institute, Inc.
380 New York Street, Redlands, CA 92373

(909) 793-2853
Fax (909) 793-5953
Telex 910 332 1317

Glossary
48

G-141/3.36.06

clutter

129

Unwanted data on a scanned map. Clutter, unlike noise, may have
informational content—but not the information sought by the data
entry process. Clutter, like noise, can require extra pre- or postprocessing in order to remove it. Line-following raster-to-vector
converters are efficient at dealing with clutter because they utilize
human capabilities to discern clutter from desired data. Annotation
that overlays line work is a common type of clutter.

In this example, the annotation
"129" is clutter. It overlays the
line and will interfere with
vectorizing.

COGO

1. Coordinate geometry. Software that uses legal descriptions and
survey information to create spatial vector data.
2. An ARC/INFO software extension.

continuous data

control point

coordinates

corrected photo

Continuous data consist of values representing samples from a
continuous surface, such as elevation values. Also referred to as
ordinal or ratio data.

A control point is a location on the image or map having known realworld coordinates. Control points are also called registration marks,
or tics.

An expression of location in space by the provision of pairs of
numbers that indicate offset from a known starting point. X,y
coordinates are an expression of position in Cartesian space. A
common coordinate, or georeferencing system, is a requirement for
the concurrent use of different types of data.

See orthophoto.

March 1994

Glossary
49

G-141/3.36.06

coverage

data automation

A digital analog of a single map sheet forming the basic unit of vector
data storage in ARC/INFO software. In a coverage, map features are
stored as primary features, such as arcs, nodes, polygons, and label
points; and secondary features, such as tics, extent, links, and
annotation. Map feature attributes are described and stored
independently in feature attribute tables.

The process of converting analog data such as maps, to a digital
representation of the same information.

data model

A data model is a formal method for arranging data to represent the
behavior of real-world entities. Fully developed data models describe
data types, integrity rules for the data types, and operations on the data
types. ARC/INFO software uses a georelational data model, a hybrid
data model that combines spatial data (in coverages and grids) and
attribute data (in tables). ARC/INFO's integrated data model allows
easy conversion between, and concurrent use of, raster and vector
data.

data quality

In the context of scanning data entry, data quality refers to the quality
of the source document, that is, the media itself. Data quality does not
refer to the informational veracity, accuracy or precision of the data on
the media. Thus, a well-used, folded, wrinkled, and stained thirdgeneration blue-line map has less data quality than a new Mylar
overlay map having crisp, high-contrast line work.

DBMS

Database management system; often a relational database management
system. A DBMS is the collection of software required for using and
manipulating a tabular database, and presenting multiple, different
views of the data. DBMS can also manage Binary Large Objects. See
BLOB.

Environmental Systems Research Institute, Inc.
380 New York Street, Redlands, CA 92373

(909) 793-2853
Fax (909) 793-5953
Telex 910 332 1317

Glossary
50

G-141/3.36.06

dpi

Dots per inch. Dpi is a common measure of resolution in scanners.
The more dots per inch (sampling rate) a scanner has, the greater the
resolution.

dropout

Dropout is an artifact of the scanning process that results in the loss of
data where they should exist, such as pixel thinning in line work. See
white noise.

georeference

To georeference is to establish the relationship between an image
(row, column) coordinate system and a map (x,y) coordinate system.
Georeferencing is accomplished establishing control points that can be
identified in both coordinate systems, then creating the displacement
vectors, or links, between the control points. For example, once a
raster data set is georeferenced to a vector coverage, the raster and
vector data should overlay or register.

georegister
georelational data
model

See georeference.

A hybrid data model used to represent spatial features. The
georelational data model encompasses coordinate, topological (geo),
and feature attribute (relational) information.

GIS

A geographic information system (GIS) is an organized collection of
computer hardware, software, geographic data, personnel, and
procedures designed to efficiently capture, store, update, manipulate,
analyze, and display all forms of geographically referenced
information. Complex spatial analysis is possible with a GIS that
would be difficult, time-consuming, or impracticable otherwise.

GPS

Global positioning system. A system of geostationary satellites,
ground receivers, and associated software that provides an
electronically instrumented means of determining position on the
earth.

March 1994

Glossary
51

G-141/3.36.06

grid

1. A raster geographic data set for use with ARC/INFO software.
Each grid cell is referenced by its geographic x,y location. Cells
store values. ArcScan functionality operates on the ARC/INFO
grid data structure for most operations.
2. One of many data structures commonly used to represent map
features. A raster-based data structure composed of cells of equal
size arranged in columns and rows. The value of each cell, or
group of cells, represents the feature value. (Also called
"Raster.")
3

Coordinate
1

2

1

2

3

1
0
0

Grid

0
0
0
0

0

0

grid cell

GUI

0
0

0
0

Point features

GRID

0

0
0

0

0

0

0
0

0
0

0
0

1
0

0
0

0
0

0

0

0

1

0

1

0

1

0
0

1
1

1
0

2
0

1

0

3

3

0
0

2
0

0

3

0
0

2
0

0

1

3
0

0
0

0

1

0

1

0

1

0

1

0
0

Line features

1
1

1
1

1
1

1

2

2

2

2
3

3
3

2

2

1
3

3
3

2

1

2
3

3
3

2
2

2
3

3
3

Area features

An ARC/INFO software product that provides a fully integrated
raster- or cell-based geoprocessing system for use with ARC/INFO.
GRID supports a map-algebra spatial language allowing sophisticated
spatial modeling and analysis.

A discretely uniform unit that represents a portion of the earth, such as
a square meter or square mile. Each grid cell has a value that
corresponds to the feature or characteristic at that site, such as a soil
type, census tract, or vegetation class. See pixel.

Graphical user interface. A highly visual and interactive method for
supporting human-computer interaction.

Environmental Systems Research Institute, Inc.
380 New York Street, Redlands, CA 92373

(909) 793-2853
Fax (909) 793-5953
Telex 910 332 1317

Glossary
52

G-141/3.36.06

heads-up digitizing

image

The process of using a high-resolution, bit-mapped display and mouse
to automate vector data by tracing features shown as an image on the
screen.

A graphic representation or description of an object that is typically
produced by an optical or electronic device. Common examples
include remotely sensed data such as satellite data, scanned data, and
photographs. An image is stored as a raster data set of binary or
integer values representing the intensity of reflected light, heat, or
another range of values on the electromagnetic spectrum. See raster.

image catalog

An image catalog is an organized set of spatially referenced, possibly
overlapping, images that can be accessed as one logical image.
ARC/INFO IMAGE INTEGRATOR can use image catalogs for raster
data in formats such as TIFF or RLC. The ARC/INFO GRID data
structure does not use image catalogs.

IMAGE
INTEGRATOR

A collection of image management and display tools in ARC/INFO
that allows vector and raster data to be displayed concurrently. Image
integrator commands are used to georeference and rectify images to
real-world coordinates, display images, and manage image catalogs.

image-to-world
transformation

Image-to-world transformation is the transformation between image
locations and real-world or map coordinates.

interpolated
resolution

A method employed by scanner vendors to increase output resolution
by use of software—that is, each input pixel is interpolated to produce
more output pixels. Interpolating pixel values will not improve the
informational content of the original scanned data and is usually not an
effective method for GIS applications. See optical resolution.

March 1994

Glossary
53

G-141/3.36.06

LAN

1. Local area network. Computer data communications technology
that connects computers at the same site. When computers are on
a LAN, they can share data and other computer resources, such as
printers and plotters. LANs are composed of cabling and special
data communications hardware and software.
2. An ERDAS image processing system file type.

link

A link is a displacement vector, used in the image georeference
process that links an image pixel location to map coordinates. A link
connects points in different data sets that are known to have the same
real-world coordinates.

map

A map is an abstract graphic representation of the earth's surface that
displays spatial relationships among the features, generalizes their
appearance to simplify them for the purpose of communication, and
applies symbols to aid in interpretation.

monochrome

Monochrome data are black and white. See bi-tonal.

Mylar

A transparent or translucent material used to provide a stable medium
for drafted maps. Mylar, unlike paper, does not shrink or expand due
to temperature or humidity.

network

When referring to computer hardware systems, a local area network or
a wide area network. See LAN.

NFS

Network File System. A standard for accessing data stored on media
attached to other computers over a network. The NFS interface
specifications are licensed by Sun Microsystems and have become a
de facto standard in the computer industry. NFS is based on the
TCP/IP network protocol.

Environmental Systems Research Institute, Inc.
380 New York Street, Redlands, CA 92373

(909) 793-2853
Fax (909) 793-5953
Telex 910 332 1317

Glossary
54

G-141/3.36.06

optical resolution

A measure of sampling rate applied to scanners. As a rule of thumb,
optical resolution can be estimated by counting the number of cameras
in the scanner, adding one, and multiplying by 100. Thus, a threecamera scanner can have up to 400 dpi optical resolution. 400 dpi
means that the scanner is capable of discerning (sampling) 400
different values in one inch of scanned media. See interpolated
resolution.

orthophoto

Aerial photography that has been corrected in horizontal displacement
caused by optics or topography. Orthophotos are much more suitable
for measurement or vectorization than uncorrected photography.
Orthophotos can be created photographically or digitally. See rectified
photo.

pixel

Pixel is short for "picture element." It is the smallest resolvable
element in an image. A pixel has both a spatial location and a value
component. Pixels are analogous to grid cells.

raster

A cellular data structure composed of rows and columns, with the
value of each cell representing a feature value. Groups of cells are
used to represent each feature. The structure is commonly used to
store image data. See grid.
3
1

2

1

2

3

1
0

Raster
data
sets

0
0
0
0

0

0
0

0
0

0
0

0

0

0
0

0
0

0
0

1
0

0
0

0
0

0
0

0
0

0

0

0

1

0

1

0

1

0
0

Point features

March 1994

1
1

0
3

0
0

2
0

0

3

0
0

2
0

0
3

1
0

2
0

1
1

3
0

0
0

Line features

0

1

0

1

0

1

0

1

0
0

1
1

2
2

2
3

3
3

2

2

1
3

3
3

2
2

1
1

1
1

1
1

2
3

3
3

Area features

3
3

2
3

2
2

Glossary
55

G-141/3.36.06

raster-to-raster
data conversion

The process of converting one raster data format to another. This
process can be used for a variety of reasons, including software
compatibility or data compression.

raster-to-vector
data conversion

The process of converting the informational content in a raster data set
to an equivalent representation in vector format.

RDBMS

A relational database management system (RDBMS) is a database
management system with the ability to access data organized in tabular
files that may be related together by a common field. A relational
database management system (RDBMS) has the capability to
recombine and display the data items from different files, thus
providing powerful tools for data usage.

real-world
coordinates

Real-world coordinates are an x,y coordinate system used to represent
geographic locations in terms of measurements on the earth's surface.
State Plane coordinates, latitude-longitude, and map projection
coordinates are all real-world coordinates. Raster data row, column,
or x,y in inches are not real-world coordinates.

rectification

rectified photo
registration
resolution

Rectification is the process by which an image is converted from
image coordinates to real-world coordinates. Rectification is
performed using georeference information.

See orthophoto.

See georeference.

A measure of the sampling rate used to create a raster data set. The
greater the resolution, the more data are preserved from the original
source. Resolution is often expressed in dots per inch for scanners,
and pixel size for satellite data.

Environmental Systems Research Institute, Inc.
380 New York Street, Redlands, CA 92373

(909) 793-2853
Fax (909) 793-5953
Telex 910 332 1317

Glossary
56

G-141/3.36.06

RLC

Run Length Compression. An industry-standard format for bi-tonal
raster data that has high-compression characteristics. Many scanners
output data in RLC format. ARC/INFO and ArcScan support RLC
formatted data.

scalable systems

A term used in the computer industry that indicates the ability of a
computer system to grow along with its scope of use. A scalable
system can use the same software without modification along a scale
of different-sized processors.

scanning

The process of data input in raster format with a device called a
scanner.

SCANNER

Map manuscript

ArcScan

Raster data created from scanning

SCSI

separates

Coverage created by ArcScan

Small Computer Systems Interface. A computer industry standard for
interfacing peripheral devices to processors. SCSI is a high-bandwidth interface capable of transferring large amounts of data rapidly.
Many scanner and plotters use a SCSI interface, which enables them
to be used with any computer that also supports the SCSI standard.

Maps that have only one feature type, or theme, per map sheet. A
published map often consists of several separates merged in the
printing process.

March 1994

Glossary
57

G-141/3.36.06

software
environment

spatial data

A context for software usage whereby software functionality in one
part of the system is available to other parts. For example, since
ARC/INFO provides a single integrated software environment,
functionality in the ARCEDIT subsystem is available to the ArcScan
extension.

Spatial data are information about the location, shape, and
relationships among geographic features, usually stored as coordinates
and topology. Georeferenced raster data are spatial data.

speckling

In scanned data, the appearance of small black spots on a white
background. See black noise.

threshold

As used in reference to scanners, the threshold is the point in the 256
spectrum of grayscale values output by the CCD at which scanner
software will translate values above the threshold to white, and values
below the threshold to black for bi-tonal output. The optimum
threshold value will vary from map to map and from map layer to map
layer, although a single threshold value will likely be appropriate
within a single map layer. Scanner controller software will provide
tools for examining threshold alternatives and setting an optimal
threshold.

TIFF

topology

Tagged Interchange File Format. An industry-standard raster data
format. ARC/INFO supports the TIFF format using a variety of
compression standards. TIFF is commonly used for grayscale data.

Topology is the spatial relationships between connecting or adjacent
coverage features (e.g., arcs, nodes, polygons, and points). For
example, the topology of an arc includes its from- and to-nodes and its
left- and right-polygons. Topology enable many GIS functions such
as vehicle routing. Raster data do not have topology.

Environmental Systems Research Institute, Inc.
380 New York Street, Redlands, CA 92373

(909) 793-2853
Fax (909) 793-5953
Telex 910 332 1317

Glossary
58

G-141/3.36.06

vector

vectorization

white noise

A coordinate-based data structure commonly used to represent map
features. Each linear feature is represented as a list of ordered x,y
coordinates. Attributes are associated with the feature (as opposed to
a raster data structure, which associates attributes with a grid cell).

The process of taking the informational content in raster data and
turning it into vector representation. See data automation.

In scanned data, the appearance of small white spots on a black
background as might occur in a scan of a mask separate (i.e., a scan
of a solid black area). Any white pixel that should be black (if it were
true to the original informational content) is white noise. In other
words, white noise is absence of data where data should exist. See
black noise.

White noise, or absence of data.

March 1994

ESRI GIS SOLUTIONS

F

or over 25 years ESRI has been helping people manage and analyze
geographic information. ESRI offers a framework for implementing
GIS in any organization with a seamless link from personal GIS
on the desktop to enterprisewide GIS client/server and data
management systems. Our GIS solutions are flexible and can be
customized to meet the needs of our users. ESRI is a full-service
GIS company, ready to help you begin, grow, and build success
with GIS.

ArcView software enables users to quickly select and display
®

different combinations of data and to creatively visualize information from their desktops.
®

ARC/INFO software is a high-end GIS with capabilities for the
automation, modification, management, analysis, and display of
geographic information.
®

PC ARC/INFO software is a full-featured GIS for DOS and
Windows™-based PCs.
®

ArcCAD software links ARC/INFO, the world’s leading GIS
software, to AutoCAD®, the world’s leading CAD software.
TM

SPATIAL DATABASE ENGINE (SDE ) software is a
high-performance spatial database that employs a true client/server
architecture to perform fast, efficient spatial operations and
management of large, shared geographic sets.

For more information, call

1-800-447-9778 (1-800-GIS-XPRT)
Send E-mail inquiries to [email protected]
Visit ESRI on the World Wide Web at http://www.esri.com

CORPORATE OFFICE
ESRI
380 New York Street
Redlands, California
92373-8100 USA
Telephone: 909-793-2853
Fax: 909-793-5953
U.S. OFFICES
Alaska
Telephone: 907-344-6613
Boston
Telephone: 508-777-4543
California
Telephone: 909-793-2853, ext. 1906
Charlotte
Telephone: 704-541-9810
Denver
Telephone: 303-449-7779
Minneapolis
Telephone: 612-454-0600
Olympia
Telephone: 360-754-4727
Philadelphia
Telephone: 610-725-0901
San Antonio
Telephone: 210-340-5762
St. Louis
Telephone: 314-949-6620
Washington, D.C.
Telephone: 703-506-9515
INTERNATIONAL OFFICES
ESRI-Australia
Telephone: 61-9-242-1005
ESRI-Canada
Telephone: 416-441-6035
ESRI-France
Telephone: 33-1-450-78811
ESRI-Germany
Telephone: 49-8166-380
ESRI-Italia
Telephone: 39-6-406-961
ESRI-South Asia
Telephone: 65-735-8755
ESRI-Spain
Telephone: 34-1-559-4375
ESRI-Sweden
Telephone: 46-23-84094
ESRI-Thailand
Telephone: 66-2-678-0707
ESRI (UK)
Telephone: 44-1-923-210-450
For the location of an international
distributor, call 909-793-2853, ext. 1375

TM
ESRI

GIS by ESRI
63970


TM

Printed in USA

Sponsor Documents

Or use your account on DocShare.tips

Hide

Forgot your password?

Or register your new account on DocShare.tips

Hide

Lost your password? Please enter your email address. You will receive a link to create a new password.

Back to log-in

Close