DATA RECOVERY
Data recovery is the process of salvaging data from damaged, failed, corrupted, or inaccessible secondary storage media when it cannot be accessed normally. Often the data are being salvaged from storage media such as internal or external hard disk drives, solid state drives (SSD), USB flash, storage tapes, CDs, DVDs, RAID, and other electronics. The most common "data recovery" scenario involves an operating system (OS) failure in which case the goal is simply to copy all wanted files to another disk. This can be easily accomplished with a Live CD, most of which provide a means to mount the system drive and backup disks or removable media, and to move the files from the system disk to the backup media with a file manager or optical disc authoring software. Another scenario involves a disk-level failure, such as a compromised file system or disk partition or a hard disk failure. In any of these cases, the data cannot be easily read. Depending on the situation, solutions involve repairing the file system, partition table or master boot record, or hard disk recovery techniques ranging from software-based recovery of corrupted data to hardware replacement on a physically damaged disk. If hard disk recovery is necessary, the disk itself has typically failed permanently, and the focus is rather on a one-time recovery, salvaging whatever data can be read. In a third scenario, files have been "deleted" from a storage medium. Typically, deleted files are not erased immediately; instead, references to them in the directory structure are removed, and the space they occupy is made available for later overwriting. In the meantime, the original file may be restored. Although there is some confusion over the term, "data recovery" may also be used in the context of forensic applications or espionage.
Recovery techniques
Recovering data from physically damaged hardware can involve multiple techniques. Some damage can be repaired by replacing parts in the hard disk. This alone may make the disk usable, but there may still be logical damage. A specialized disk-imaging procedure is used to recover every readable bit from the surface. Once this image is acquired and saved on a reliable medium, the image can be safely analysed for logical damage and will possibly allow for much of the original file system to be reconstructed.
Hardware repair
Examples of physical recovery procedures are: removing a damaged PCB (printed circuit board) and replacing it with a matching PCB from a healthy drive, performing a live PCB swap (in which the System Area of the HDD is damaged on the target drive which is then
instead read from the donor drive, the PCB then disconnected while still under power and transferred to the target drive), read/write head assembly with matching parts from a healthy drive, removing the hard disk platters from the original damaged drive and installing them into a healthy drive, and often a combination of all of these procedures. Some data recovery companies have procedures that are highly technical in nature and are not recommended for an untrained individual. Many of these procedures will void the manufacturer's warranty.
Recovering from logical (non-hardware) damage
Overwritten data
When data have been physically overwritten on a hard disk it is generally assumed that the previous data are no longer possible to recover. In 1996, Peter Gutmann, a computer scientist,presented a paper that suggested overwritten data could be recovered through the use of magnetic force microscope.] In 2001, he presented another paper on a similar topic. [2] Substantial criticism has followed, primarily dealing with the lack of any concrete examples of significant amounts of overwritten data being recovered.] To guard against this type of data recovery, he and Colin Plumb designed the Gutmann method, which is used by several disk scrubbing software packages. Although Gutmann's theory may be correct, there's no practical evidence that overwritten data can be recovered. Moreover, there are good reasons to think that it cannot.
Corrupt file systems
In some cases, data on a hard drive can be unreadable due to damage to the file system. In the majority of these cases, at least a portion of the original data can be recovered by repairing the damaged file system using specialized data recovery software. This type of data recovery can be performed by knowledgeable end-users as it requires no special physical equipment. However, more serious cases can still require expert intervention.
Online Data Recovery
"Online" or "Remote" data recovery is yet another method to restore the lost or deleted data. It is same as performing the regular software based recoveries except that this kind of recovery is performed over the Internet without physically having the drive or computer in possession. The recovery technician sitting somewhere else gains access to user's computer and complete the recovery job online. In this scenario, the user doesn't have to travel or send the media to anywhere physically. Although online data recovery is convenient and useful in many cases, it still carries some points making it less popular than the classic data recovery methods. First of all, it requires a stable broadband Internet connection for it to be performed correctly, which
many third world countries still lack. Also, it cannot be performed in case of physical damage to media and for such cases, the traditional in-lab recovery has to take place.
A remote, online, or managed backup service is a service that provides users with a system for the backup and storage of computer files. Online backup systems are typically built around a client software program that runs on a schedule, typically once a day, and usually at night while computers aren't in use. This program typically collects, compresses, encrypts, and transfers the data to the remote backup service provider's servers or off-site hardware.
Advantages of remote backup
Remote backup has advantages over traditional backup methods:
•
• • • •
• • • •
Perhaps the most important aspect of backing up is that backups are stored in a different location from the original data. Traditional backup requires manually taking the backup media offsite. Remote backup does not require user intervention. The user does not have to change tapes, label CDs or perform other manual steps. Unlimited data retention. Backups are automatic. The correct files are backed up. Ordinary backup software is often installed with a list of files to be backed up. This set of files usually represents the state of the system when the software was installed, and often misses critical files, like files that get added later. Some remote backup services will work continuously, backing up files as they are changed. Most remote backup services will maintain a list of versions of your files. Most remote backup services will use a 128 - 448 bit encryption to send data over unsecured links (ie internet) A few remote backup services can reduce backup by only transmitting changed binary data bits
Disadvantages of remote backup
Remote backup has some disadvantages:
•
Depending on the available network bandwidth, the restoration of data can be slow. Because data is stored offsite, the data must be recovered either via the Internet or via a disk shipped from the online backup service provider.
•
•
• •
Some backup service providers have no guarantee that stored data will be kept private - for example, from employees. As such, most recommend that files be encrypted. It is possible that a remote backup service provider could go out of business or be purchased, which may affect the accessibility of one's data or the cost to continue using the service. If encryption password is lost, no more data recovery will be possible. However with managed services this should not be a problem. Residential broadband services often have monthly limits that preclude large backups. They are also usually asymmetric; the user-to-network link regularly used to store backups is much slower than the network-to-user link used only when data is restored.
Typical features
Encryption Data should be encrypted before it is sent across the internet, and it should be stored in its encrypted state. Encryption should be at least 256 bits, and the user should have the option of using his own encryption key, which should never be sent to the server. Network backup A backup service supporting network backup can back up multiple computers, servers or Network Attached Storage appliances on a local area network from a single computer or device. Continuous backup - Continuous Data Protection Allows the service to back up continuously or on a predefined schedule. Both methods have advantages and disadvantages. Most backup services are schedulebased and perform backups at a predetermined time. Some services provide continuous data backups which are used by large financial institutions and large online retailers. However, there is typically a trade-off with performance and system resources. File-by-File Restore The ability for users to restore files themselves, without the assistance of a Service Provider by allowing the user select files by name and/or folder. Some services allow users to select files by searching for filenames and folder names, by dates, by file type, by backup set, and by tags. Online access to files Some services allow you to access backed-up files via a normal web browser. Many services do not provide this type of functionality. Data compression Data will typically be compressed with a lossless compression algorithm to minimize the amount of bandwidth used. Differential data compression A way to further minimize network traffic is to transfer only the binary data that has changed from one day to the next, similar to the open source file transfer
service Rsync. More advanced online backup services use this method rather than transfer entire files. Bandwidth usage User-selectable option to use more or less bandwidth; it may be possible to set this to change at various times of day.
Database administrator
A database administrator (short form DBA) is a person responsible for the design, implementation, maintenance and repair of an organization's database. They are also known by the titles Database Coordinator or Database Programmer, and is closely related to the Database Analyst, Database Modeller, Programmer Analyst, and Systems Manager. The role includes the development and design of database strategies, monitoring and improving database performance and capacity, and planning for future expansion requirements. They may also plan, co-ordinate and implement security measures to safeguard the database.
Skills
• • • •
Strong organizational skills Strong logical and analytical thinker Ability to concentrate and pay close attention to detail Willing to pursue education throughout your career
Duties
A database administrator's activities can be listed as below:
• • • • • • •
Transferring Data Replicating Data Maintaining database and ensuring its availability to users Controlling privileges and permissions to database users Monitoring database performance Database backup and recovery Database security
Network model (database)
The network model is a database model conceived as a flexible way of representing objects and their relationships. Its distinguishing feature is that the schema, viewed as a graph in which object types are nodes and relationship types are arcs, is not restricted to being a hierarchy or lattice.
Example of a Network Model. The network model's original inventor was Charles Bachman, and it was developed into a standard specification published in 1969 by the CODASYL Consortium.
ADVANTAGES
• •
Provide very efficient "High-speed" retrieval Simplicity
The network model is conceptually simple and easy to design.
•
Ability to handle more relationship types
The network model can handle the one-to-many and many-to-many relationships.
•
Ease of data access
In the network database terminology, a relationship is a set. Each set comprises of two types of records.- an owner record and a member record, In a network model an application can access an owner record and all the member records within a set.
•
Data Integrity
In a network model, no member can exist without an owner. A user must therefore first define the owner record and then the member record. This ensures the integrity.
•
Data Independence
The network model draws a clear line of demarcation between programs and the complex physical storage details. The application programs work independently of the data. Any changes made in the data characteristics do not affect the application program. DISADVANTAGES
•
System complexity
In a network model, data are accessed one record at a time. This males it essential for the database designers, administrators, and programmers to be familiar with the internal data structures to gain access to the data. Therefore, a user friendly database management system cannot be created using the network model
•
Lack of Structural independence.
Making structural modifications to the database is very difficult in the network database model as the data access method is navigational. Any changes made to the database structure require the application programs to be modified before they can access data. Though the network model achieves data independence, it still fails to achieve structural independence.
Object database
Example of an object-oriented model.[1]
An object database (also object-oriented database management system) is a database management system in which information is represented in the form of objects as used in object-oriented programming. Object databases are a niche field within the broader database management system (DBMS) market dominated by relational database management systems. Object databases have been considered since the early 1980s and 1990s, but they have made little impact on mainstream commercial data processing, though there is some usage in specialized areas.
Potential advantages:
• • • • • • •
Objects don't require assembly and disassembly saving coding time and execution time to assemble or disassemble objects. Reduced paging. Easier navigation. Better concurrency control - a hierarchy of objects may be locked. Data model is based on the real world. Works well for distributed architectures. Less code required when applications are object oriented.
Potential disadvantages:
• • • • • •
Lower efficiency when data is simple and relationships are simple. Relational tables are simpler. Late binding may slow access speed. More user tools exist for RDBMS. Standards for RDBMS are more stable. Support for RDBMS is more certain and change is less likely to be required.
Adoption of object databases
Object databases based on persistent programming acquired a niche in application areas such as engineering and spatial databases, telecommunications, and scientific areas such as high energy physics and molecular biology. They have made little impact on mainstream commercial data processing, though there is some usage in specialized areas of financial services.[6] Software as a service vendor Workday, Inc. has built its technology on a proprietary memory-centric OODBMS, which some observers believe illustrates new OODBMS market potential. [7] Another group of object databases focuses on embedded use in devices, packaged software, and real-time systems.
[edit] Technical features
Most object databases also offer some kind of query language, allowing objects to be found by a more declarative programming approach. It is in the area of object query languages, and the integration of the query and navigational interfaces, that the biggest differences between products are found. An attempt at standardization was made by the ODMG with the Object Query Language, OQL. Access to data can be faster because joins are often not needed (as in a tabular implementation of a relational database). This is because an object can be retrieved directly without a search, by following pointers. (It could, however, be argued that "joining" is a higher-level abstraction of pointer following.) Another area of variation between products is in the way that the schema of a database is defined. A general characteristic, however, is that the programming language and the database schema use the same type definitions. Multimedia applications are facilitated because the class methods associated with the data are responsible for its correct interpretation. Many object databases, for example VOSS, offer support for versioning. An object can be viewed as the set of all its versions. Also, object versions can be treated as objects in their own right. Some object databases also provide systematic support for triggers and constraints which are the basis of active databases. The efficiency of such a database is also greatly improved in areas which demand massive amounts of data about one item. For example, a banking institution could get the user's account information and provide them efficiently with extensive information such as transactions, account information entries etc. The Big O Notation for such a database paradigm drops from O(n) to O(1), greatly increasing efficiency in these specific cases.
Data security
Data security is the means of ensuring that data is kept safe from corruption and that access to it is suitably controlled. Thus data security helps to ensure privacy. It also helps in protecting personal data.
Data Security Technologies
Disk Encryption
Disk encryption refers to encryption technology that encrypts data on a hard disk drive. Disk encryption typically takes form in either software (see disk encryption software] or
hardware (see disk encryption hardware). Disk encryption is often referred to as on-thefly encryption ("OTFE") or transparent encryption.
Hardware based Mechanisms for Protecting Data
Software based security solutions encrypt the data to prevent data from being stolen. However, a malicious program or a hacker may corrupt the data in order to make it unrecoverable or unusable. Similarly, encrypted operating systems can be corrupted by a malicious program or a hacker, making the system unusable. Hardware-based security solutions can prevent read and write access to data and hence offers very strong protection against tampering and unauthorized access. Hardware based or assisted computer security offers an alternative to software-only computer security. Security tokens such as those using PKCS#11 may be more secure due to the physical access required in order to be compromised. Access is enabled only when the token is connected and correct PIN is entered (see two factor authentication). However, dongles can be used by anyone who can gain physical access to it. Newer technologies in hardware based security solves this problem offering fool proof security for data. Working of Hardware based security: A hardware device allows a user to login, logout and to set different privilege levels by doing manual actions. The device uses biometric technology to prevent malicious users from logging in, logging out, and changing privilege levels. The current state of a user of the device is read by controllers in peripheral devices such as harddisks. Illegal access by a malicious user or a malicious program is interrupted based on the current state of a user by harddisk and DVD controllers making illegal access to data impossible. Hardware based access control is more secure than protection provided by the operating systems as operating systems are vulnerable to malicious attacks by viruses and hackers. The data on harddisks can be corrupted after a malicious access is obtained. With hardware based protection, software cannot manipulate the user privilege levels, it is impossible for a hacker or a malicious program to gain access to secure data protected by hardware or perform unauthorized privileged operations. The hardware protects the operating system image and file system privileges from being tampered. Therefore, a completely secure system can be created using a combination of hardware based security and secure system administration policies.
Backups
Backups are used to ensure data which is lost can be recovered
Data Masking
Data Masking of structured data is the process of obscuring (masking) specific data within a database table or cell to ensure that data security is maintained and sensitive information is not exposed to unauthorized personnel. This may include masking the data from users (for example so banking customer representatives can only see the last 4 digits of a customers national identity number), developers (who need real production data to test new software releases but should not be able to see sensitive financial data), outsourcing vendors, etc.
Data Erasure
Data erasure is a method of software-based overwriting that completely destroys all electronic data residing on a hard drive or other digital media to ensure that no sensitive data is leaked when an asset is retired or reused.
Database normalization
In the design of a relational database management system (RDBMS), the process of organizing data to minimize redundancy is called normalization. The goal of database normalization is to decompose relations with anomalies in order to produce smaller, wellstructured relations. Normalization usually involves dividing large tables into smaller (and less redundant) tables and defining relationships between them. The objective is to isolate data so that additions, deletions, and modifications of a field can be made in just one table and then propagated through the rest of the database via the defined relationships.
Objectives of normalization
A basic objective of the first normal form defined by Codd in 1970 was to permit data to be queried and manipulated using a "universal data sub-language" grounded in first-order logic.[8] (SQL is an example of such a data sub-language, albeit one that Codd regarded as seriously flawed.)[9] The objectives of normalization beyond 1NF (First Normal Form) were stated as follows by Codd: 1. To free the collection of relations from undesirable insertion, update and deletion dependencies; 2. To reduce the need for restructuring the collection of relations as new types of data are introduced, and thus increase the life span of application programs; 3. To make the relational model more informative to users;
4. To make the collection of relations neutral to the query statistics, where these statistics are liable to change as time goes by. —E.F. Codd, "Further Normalization of the Data Base Relational Model"[10] The sections below give details of each of these objectives.
[edit] Free the database of modification anomalies
An update anomaly. Employee 519 is shown as having different addresses on different records.
An insertion anomaly. Until the new faculty member, Dr. Newsome, is assigned to teach at least one course, his details cannot be recorded.
A deletion anomaly. All information about Dr. Giddens is lost when he temporarily ceases to be assigned to any courses. When an attempt is made to modify (update, insert into, or delete from) a table, undesired side-effects may follow. Not all tables can suffer from these side-effects; rather, the sideeffects can only arise in tables that have not been sufficiently normalized. An insufficiently normalized table might have one or more of the following characteristics:
•
The same information can be expressed on multiple rows; therefore updates to the table may result in logical inconsistencies. For example, each record in an "Employees' Skills" table might contain an Employee ID, Employee Address, and Skill; thus a change of address for a particular employee will potentially need to
•
•
be applied to multiple records (one for each of his skills). If the update is not carried through successfully—if, that is, the employee's address is updated on some records but not others—then the table is left in an inconsistent state. Specifically, the table provides conflicting answers to the question of what this particular employee's address is. This phenomenon is known as an update anomaly. There are circumstances in which certain facts cannot be recorded at all. For example, each record in a "Faculty and Their Courses" table might contain a Faculty ID, Faculty Name, Faculty Hire Date, and Course Code—thus we can record the details of any faculty member who teaches at least one course, but we cannot record the details of a newly-hired faculty member who has not yet been assigned to teach any courses except by setting the Course Code to null. This phenomenon is known as an insertion anomaly. There are circumstances in which the deletion of data representing certain facts necessitates the deletion of data representing completely different facts. The "Faculty and Their Courses" table described in the previous example suffers from this type of anomaly, for if a faculty member temporarily ceases to be assigned to any courses, we must delete the last of the records on which that faculty member appears, effectively also deleting the faculty member. This phenomenon is known as a deletion anomaly.
[edit] Minimize redesign when extending the database structure
When a fully normalized database structure is extended to allow it to accommodate new types of data, the pre-existing aspects of the database structure can remain largely or entirely unchanged. As a result, applications interacting with the database are minimally affected.
[edit] Make the data model more informative to users
Normalized tables, and the relationship between one normalized table and another, mirror real-world concepts and their interrelationships.
[edit] Avoid bias towards any particular pattern of querying
Normalized tables are suitable for general-purpose querying. This means any queries against these tables, including future queries whose details cannot be anticipated, are supported. In contrast, tables that are not normalized lend themselves to some types of queries, but not others. For example, consider an online bookseller whose customers maintain wishlists of books they'd like to have. For the obvious, anticipated query—what books does this customer want?—it's enough to store the customer's wishlist in the table as, say, a homogeneous string of authors and titles.
With this design, though, the database can answer only that one single query. It cannot by itself answer interesting but unanticipated queries: What is the most-wished-for book? Which customers are interested in WWII espionage? How does Lord Byron stack up against his contemporary poets? Answers to these questions must come from special adaptive tools completely separate from the database. One tool might be software written especially to handle such queries. This special adaptive software has just one single purpose: in effect to normalize the non-normalized field. Unforeseen queries can be answered trivially, and entirely within the database framework, with a normalized table.
[edit] Example
Querying and manipulating the data within an unnormalized data structure, such as the following non-1NF representation of customers' credit card transactions, involves more complexity than is really necessary: Customer Jones Wilkins Transactions Tr. ID Date Amount 12890 14-Oct-2003 −87 12904 15-Oct-2003 −50 Tr. ID Date Amount 12898 14-Oct-2003 −21 Tr. ID Date Amount 12907 15-Oct-2003 −18 14920 20-Nov-2003 −70 15003 27-Nov-2003 −60
Stevens
To each customer there corresponds a repeating group of transactions. The automated evaluation of any query relating to customers' transactions therefore would broadly involve two stages: 1. Unpacking one or more customers' groups of transactions allowing the individual transactions in a group to be examined, and 2. Deriving a query result based on the results of the first stage For example, in order to find out the monetary sum of all transactions that occurred in October 2003 for all customers, the system would have to know that it must first unpack the Transactions group of each customer, then sum the Amounts of all transactions thus obtained where the Date of the transaction falls in October 2003. One of Codd's important insights was that this structural complexity could always be removed completely, leading to much greater power and flexibility in the way queries
could be formulated (by users and applications) and evaluated (by the DBMS). The normalized equivalent of the structure above would look like this: Customer Tr. ID Date Amount Jones 12890 14-Oct-2003 −87 Jones 12904 15-Oct-2003 −50 Wilkins 12898 14-Oct-2003 −21 Stevens 12907 15-Oct-2003 −18 Stevens 14920 20-Nov-2003 −70 Stevens 15003 27-Nov-2003 −60 Now each row represents an individual credit card transaction, and the DBMS can obtain the answer of interest, simply by finding all rows with a Date falling in October, and summing their Amounts. The data structure places all of the values on an equal footing, exposing each to the DBMS directly, so each can potentially participate directly in queries; whereas in the previous situation some values were embedded in lower-level structures that had to be handled specially. Accordingly, the normalized design lends itself to general-purpose query processing, whereas the unnormalized design does not.
Relational database management system
A relational database management system (RDBMS) is a database management system (DBMS) that is based on the relational model as introduced by E. F. Codd. Most popular commercial and open source databases currently in use are based on the relational database model. A short definition of an RDBMS is: a DBMS in which data is stored in tables and the relationships among the data are also stored in tables. The data can be accessed or reassembled in many different ways without having to change the table forms.
Hierarchical database model
A hierarchical data model is a data model in which the data is organized into a tree-like structure. The structure allows representing information using parent/child relationships: each parent can have many children but each child only has one parent (also known as a 1:many ratio ). All attributes of a specific record are listed under an entity type.
Example of a hierarchical model In a database, an entity type is the equivalent of a table; each individual record is represented as a row and an attribute as a column. Entity types are related to each other using 1: N mapping, also known as one-to-many relationships. this model is recognized as the first data base model created by IBM in the 1960s. The most recognized and used hierarchical databases are IMS developed by IBM and Windows Registry by Microsoft.
Examples of hierarchical data represented as relational tables
An organization could store employee information in a table that contains attributes/columns such as employee number, first name, last name, and Department number. The organization provides each employee with computer hardware as needed, but computer equipment may only be used by the employee to which it is assigned. The organization could store the computer hardware information in a separate table that includes each part's serial number, type, and the employee that uses it. The tables might look like this: EmpNo First Name Last Name Dept. Num 100 Sally Baker 10-L 101 Jack Douglas 10-L 102 Sarah Schultz 20-B 103 David Drachmeier 20-B Serial Num 3009734-4 3-23-283742 2-22-723423 232342 Type User EmpNo Computer 100 Monitor 100 Monitor 100 Printer 100
In this model, the employee data table represents the "parent" part of the hierarchy, while the computer table represents the "child" part of the hierarchy. In contrast to tree structures usually found in computer software algorithms, in this model the children point to the parents. As shown, each employee may possess several pieces of computer equipment, but each individual piece of computer equipment may have only one employee owner. Consider the following structure:
EmpNo Designation ReportsTo 10 Director 20 Senior Manager 10 30 Typist 20 40 Programmer 20 In this, the "child" is the same type as the "parent". The hierarchy stating EmpNo 10 is boss of 20, and 30 and 40 each report to 20 is represented by the "ReportsTo" column. In Relational database terms, the ReportsTo column is a foreign key referencing the EmpNo column. If the "child" data type were different, it would be in a different table, but there would still be a foreign key referencing the EmpNo column of the employees table. This simple model is commonly known as the adjacency list model, and was introduced by Dr. Edgar F. Codd after initial criticisms surfaced that the relational model could not model hierarchical data.
Database
A database is an organized collection of data for one or more purposes, usually in digital form. The data are typically organized to model relevant aspects of reality (for example, the availability of rooms in hotels), in a way that supports processes requiring this information (for example, finding a hotel with vacancies). The term "database" refers both to the way its users view it, and to the logical and physical materialization of its data, content, in files, computer memory, and computer data storage.
Database design
Database design is the process of producing a detailed data model of a database. This logical data model contains all the needed logical and physical design choices and physical storage parameters needed to generate a design in a Data Definition Language, which can then be used to create a database. A fully attributed data model contains detailed attributes for each entity. The term database design can be used to describe many different parts of the design of an overall database system. Principally, and most correctly, it can be thought of as the logical design of the base data structures used to store the data. In the relational model these are the tables and views. In an object database the entities and relationships map directly to object classes and named relationships. However, the term database design could also be used to apply to the overall process of designing, not just the base data structures, but also the forms and queries used as part of the overall database application within the database management system (DBMS).[1] The process of doing database design generally consists of a number of steps which will be carried out by the database designer. Usually, the designer must:
• •
Determine the relationships between the different data elements. Superimpose a logical structure upon the data on the basis of these relationships.
ER Diagram (Entity-relationship model)
Database designs also include ER(Entity-relationship model) diagrams. An ER diagram is a diagram that helps to design databases in an efficient way. Attributes in ER diagrams are usually modeled as an oval with the name of the attribute, linked to the entity or relationship that contains the attribute. Within the relational model the final step can generally be broken down into two further steps, that of determining the grouping of information within the system, generally determining what are the basic objects about which information is being stored, and then determining the relationships between these groups of information, or objects. This step is not necessary with an Object database.[2]
The Design Process
The design process consists of the following steps[3]: 1. Determine the purpose of your database - This helps prepare you for the remaining steps. 2. Find and organize the information required - Gather all of the types of information you might want to record in the database, such as product name and order number. 3. Divide the information into tables - Divide your information items into major entities or subjects, such as Products or Orders. Each subject then becomes a table. 4. Turn information items into columns - Decide what information you want to store in each table. Each item becomes a field, and is displayed as a column in the table. For example, an Employees table might include fields such as Last Name and Hire Date. 5. Specify primary keys - Choose each table’s primary key. The primary key is a column that is used to uniquely identify each row. An example might be Product ID or Order ID. 6. Set up the table relationships - Look at each table and decide how the data in one table is related to the data in other tables. Add fields to tables or create new tables to clarify the relationships, as necessary. 7. Refine your design - Analyze your design for errors. Create the tables and add a few records of sample data. See if you can get the results you want from your tables. Make adjustments to the design, as needed.
8. Apply the normalization rules - Apply the data normalization rules to see if your tables are structured correctly. Make adjustments to the tables
Types of Database design
Conceptual schema
Once a database designer is aware of article: Conceptual schema the data which is to be stored within the database, they must then determine where dependancy is within the data. Sometimes when data is changed you can be changing other data that is not visible. For example, in a list of names and addresses, assuming a situation where multiple people can have the same address, but one person cannot have more than one address, the name is dependent upon the address, because if the address is different, then the associated name is different too. However, the other way around is different. One attribute can change and not another. (NOTE: A common misconception is that the relational model is so called because of the stating of relationships between data elements therein. This is not true. The relational model is so named because it is based upon the mathematical structures known as relations.)
Logically structuring data
Once the relationships and dependencies amongst the various pieces of information have been determined, it is possible to arrange the data into a logical structure which can then be mapped into the storage objects supported by the database management system. In the case of relational databases the storage objects are tables which store data in rows and columns. Each table may represent an implementation of either a logical object or a relationship joining one or more instances of one or more logical objects. Relationships between tables may then be stored as links connecting child tables with parents. Since complex logical relationships are themselves tables they will probably have links to more than one parent. In an Object database the storage objects correspond directly to the objects used by the Object-oriented programming language used to write the applications that will manage and access the data. The relationships may be defined as attributes of the object classes involved or as methods that operate on the object classes.
Physical database design
The physical design of the database specifies the physical configuration of the database on the storage media. This includes detailed specification of data elements, data types,
indexing options and other parameters residing in the DBMS data dictionary. It is the detailed design of a system that includes modules & the database's hardware & software specifications of the system.