Terminology
Data = known facts that can be recorded Database (DB) = logically coherent collection of related data with some inherent meaning
Entities such as students, courses, sections Relationships between entities such as students taking courses and sections being part of courses
Database management system (DBMS) = collection of programs that enable users to create and maintain a DB; general-purpose software system that facilitates process of defining, constructing, and manipulating DBs for various applications.
What is database?
A shared collection of logically related data and a description of this data, designed to meet the information needs of an organization
Data repository (data resource) Designed independently of applications (i.e., data abstraction) Long-term information needs at the enterprise level Primarily designed for quick and efficient data retrieval
What the Purpose for Learning about Database?
Paradigm shift: Data driven business environment
Production efficiencies
Knowledge and innovation (e.g., knowledge management, business intelligence)
Coordination of vendors (e.g., supply chain management
Competitor and marketplace information
Customer information (e.g., database marketing, CRM)
History (1)
Early 60s
Charles Bachman introduced first general purpose DBMS known as IDS (Turing Award 1973) at General Electric (GE) Integrated Data Store (IDS) formed the basis for N/w data model Network Data Model was standardized by the Conference on Data Systems Languages (CODASYL). IBM developed IMS Information Management Systems (IMS) formed basis for Hierarchical Data Model Hierarchichal Data Model SABRE system for making airline reservation jointly by IBM and American Airlines (allowed several people to access the same data thro’ computer N/W) Edgar Codd, at IBM proposed Relational Data Model (Turing Award 1981) Use of DBMSs for managing corporate data became standard practice
Late 60s
70s
History (2)
80s
Relational Data Model became dominant DBMS paradigm SQL query language for relational DBs developed as part of IBM’s System R project – is now the standard query language Transaction Management (concurrent execution of db programs) (James Gray, Turing Award 1999) Object-oriented Data Model Data warehouse and data mining Accessing databases through the web/internet Multimedia data Text data (information retrieval) Structure of the data (XML)
Now
Traditional File-Based System
Definition: "A collection of application programs that perform services for the end-users such as the production of reports. Each program defines and manages its own data."
Customer transactions Program Report
Operating expenses Program Report
Inventory
Vendors
Payroll
Program Report
Program Report
Program Report
One file, one application
Data Redundancy
Customer Order File
Invoice number Customer account number Customer name, address, city, state, zip code Order date Product code, product description, price, unit
Customer Account File
Account Number Customer name, mailing address, city, state, zip code
Customer Mailing List File
Customer name, mailing address, city, state, zip code
File-Based Systems
Records contain logically related data
Limitations:
Separation and isolation of data (one file, one program) Duplication of data Loss of data integrity - uncertainty of the correct version of data and no consistency Data dependence - application program defines the data Incompatibility of file formats Fixed queries/proliferation of application programs - little flexibility in meeting changing information needs
Database
“A shared collection of logically related data (and a description of this data), designed to meet the information needs of an organization.”
Data and Data Definitions Central Repository Separation Applications
Data Abstraction
Separation between the data’s structure (definition) and the application programs
Application programs can be run on either the clients or server
Applications
Data and Data Definitions Central Repository DBMS
Organizing Data
Entity - distinct object (i.e., person, place, thing, concept or event) Attribute - describes some aspect of the entity (object)
Management Queries
Customer Orders Order Items Products Manufacturers
DBMS
• DDL • DML • Controlled access Single Access Point
Application Programs
Central Repository (Organizational resource)
Other Software Multitude of Applications
Advantages of the Database Approach
Control of data redundancy Data consistency Efficient data access, Greater informational gain, more information from the same amount of data Sharing data, organizational resource (i.e., shared resource) Improved data integrity, validity and consistency Improved access and security Enforcement of standards Concurrency Access and Crash recovery Data Administration Reduced Application development time
Database Applications
Traditional database applications (banks, library catalogs, inventory, airlines, universities) Multimedia databases (images) Geographic information systems Data warehouse and online analytical processing (OLAP) Real time and active database technology (sensor systems, safety-critical systems) World wide web (e-commerce, internet banking)
DBMS Available
ORACLE DB2 – by the IBM MS-SQL Teradata Sybase Informix
Data Model
Collection of high level data description constructs that hide many low-level storage details Semantic data model
More abstract, high level data model (makes it easier to describe about the data) Widely used one is ER model – pictorially denotes entities and relationships among them
Relational Model
Relation – set of records Schema A description of data in terms of a data model is schema Schema for a relation specifies its name, name of each field (or attribute or column) and type of each field. Example
Each row in the relation is a record that describes the student
Other Data Models
Relational Data model ( dominant model) Hierarchical data model Network model Object – oriented model Object –relational model
Types of Database Models
HIERARCHICAL
COLUMN
ROW
VALUE
TABLE
RELATIONAL
Database Architecture/ Levels of Data Abstractions
External level (individual user views) Conceptual level (community user view)
Internal level (storage view) Database
Conceptual Schema
Describes data in terms of the data model of the DBMS. In a RDBMs, the conceptual schema describes all relations that are stored in the database. Eg. University Db
• Students (sid: string, name: string, gpa: real) • Faculty (fid: string, fname: string, sal: real)
Physical schema
Specifies additional storage details Summarizes how the relations described in conceptual schema are actually stored on secondary storage devices like disks and tapes Decide on what file organizations to use to store relations and indexes to speed up data retrieval operations
External Schema
Allow data access to be customized at the level of individual users or groups of users.
An Example of the Three Levels
SNo
FName
LName
Age
Salary
BranchNo
struct STAFF { int staffNo; int branchNo; Internal View char fName[15]; char lName[15]; struct date dateOfBirth; float salary; struct STAFF *next; /* pointer to next Staff record */ }; index staffNo; index branchNo; /* define indexes for staff */
Conceptual View
SNo
FName
LName
Age
Salary
External View1
SNo
LName
BranchNo
External View2
Database Design Phases
DATA ANALYSIS Entities - Attributes - Relationships - Integrity Rules
LOGICAL DESIGN
Tables - Columns - Primary Keys - Foreign Keys
PHYSICAL DESIGN DDL for Tablespaces, Tables, Indexes
Data Independence
Ability to change one schema level without affecting the higher level schemas
Physical Data Independence Ability to change physical schema or internal schema without affecting conceptual or logical schema Logical Data Independence Ability to change logical schema without affecting External or view schema. (application programs)
One imp. Adv of DBMS is data independence
Characteristics of the DB approach (1)
Single repository of data defined once, maintained and accessed by users Self-describing nature of DB
DB + description of DB structures and constraints metadata (stored in catalog)
stored in primary DB
DBMS software works with any number of DB applications
Insulation between programs and data, and data abstraction
Program--data independence Program--operation independence (OO DBMS) Abstraction: conceptual representation of data, no details of how data is stored or operators are implemented
Characteristics of the DB approach (2)
Data model
Relational data model Object-oriented data model Entity-relationship data model
Support multiple views of data
view = subset of DB virtual data derived from DB (not explicitly stored)
Sharing data and multi-user transaction processing
Concurrency control Online transaction processing (OLTP)
Query Languages
Query → questions involving data stored in dbms Relational Algebra
formal query language based on collection of operators for manipulating relations formal query language based on mathematical logic Defines db structure Commands used are for creating, altering, query data
Relational Calculus
DDL: Data Definition Language
DML: Data Manipulation Language
For manipulating (inserting, deleting, updating) db contents Procedural and Non – procedural (Declarative) DML
Types of DML
Procedural DML Must be embedded in a programming language. Searches for and retrieves individual db records and uses looping and other constructs of the host programming language to retrieve multiple records Non-Procedural or Declarative DML Can be used as a stand-alone query language or can be embedded in a programming language. Searches for and retrieves information from multiple related db records in a single command
Components of a Database Environment
Hardware Software: DBMS, application program and query software Data: Organized in a schema, partitioned into subschemas Procedures: Govern the design, access and use of the database People: Administrators (DA, DBA), designers (logical and physical), application developers and users (novice and high-powered)
Database System
Users
DATABASE
Application Programs/Queries
SYSTEM DBMS Software Software to process queries/programs
Software to access stored data
Stored Data Defn.
(META-DATA).
Stored Database
Users of the Database
Day-to-day use of the DB
Database administrators (DBA) Database designers End-users Casual end-users Naïve or parametric users Sophisticated end-users Stand-alone users System analysts and application programmers (software engineering)
Implications of the DB approach
Potential for enforcing standard Reduce application development time Flexibility Availability of up-to-date info
When not to use a DBMS
Unnecessary overhead costs
Security, concurrency control, recovery and integrity High initial investment in hardware, software, training
DB and applications are simple, well defined, not expected to change Real-time requirements not met (due to overhead) Multi-user access not required