Data Administration at Penn State: Problems and Solutions (166181415)

Published on July 2018 | Categories: Documents | Downloads: 16 | Comments: 0 | Views: 324
of 9
Download PDF   Embed   Report

Comments

Content

Data Administration at Penn State: Problems and Solutions Copyright 1990 CAUSE From _CAUSE/EFFECT_ Volume 13, Number 4, Winter 1990. Permission to copy or disseminate all or part of this material is granted provided that the copies are not made or distributed for commercial advantage, the CAUSE copyright and its dateappear, and notice is given that copying is by permission of CAUSE, the association for managing and using information resources in higher education. To disseminate otherwise, or to republish, requires written permission. For further information, contact CAUSE, 4840 Pearl East Circle, Suite 302E, Boulder, CO 80301, 303-449-4430, e-mail [email protected] DATA ADMINISTRATION AT PENN STATE: PROBLEMS AND SOLUTIONS by Ronald G. Hoover ********************************************** ********************** ************************************************* ************************** * Ronald G. Hoover, Manager of Data Administration at The Pennsylvania State University, has been associated with information management for over twenty-six years. He spent six years developing an information management system for the federal government before arriving at Penn State where he has been involved with database and data administration for the past twenty years. ********************************************** ********************** ************************************************* ************************** * ABSTRACT: This article describes some of the problems encountered in the management of data at The Pennsylvania State University and the solutions that have been implemented to solve these problems. The technical aspects of these solutions may not be generalizable, but the techniques presented may stimulate ideas for solutions to similar problems at other institutions. There is a growing realization among colleges and universities that information (data) is a valuable corporate resource that should be managed like other resources such as people and money. Data administration is the function generally established to control and encourage the efficient utilization of data for operations and decision support within the institution. The data administrator develops and administers policies and procedures for the definition, organization, protection, and use of institutional data. These functions differ from those of database administration, which typically entails maintaining the database software and designing and controlling the physical aspects of the database. At The Pennsylvania State University, data administration and database administration are combined under one manager. The implementation of data administration can vary widely from one institution to another, but many commonalities exist. This article describes the structure of data administration at Penn State, some of the problems we have encountered, and our solutions to those problems. History of data administration Data administration has existed at Penn State for many years, initially in the form of policies, procedures, and security measures

that were necessary for the normal day-to-day operation of Management Services, our administrative data processing department. As systems grew larger and more numerous, a more formal method of controlling University data was needed. In the early 1970s, paper forms were developed to collect and relate information about files, records, and data elements. Forms were good for the collection of information but proved inadequate for reporting purposes. To correct this situation, an in-house data dictionary system was developed to place the data from the forms onto magnetic tape; updating and reporting facilities were also developed. The administration of this system was assigned to the systems development group within Management Services. In 1974, the University acquired IMS, its first database management system. At that time, a database administration group was created and assumed some data administration responsibilities such as data dictionary maintenance and database logical design. In 1981, the president of the University commissioned a task force to create a request for proposal for the development of a fully integrated Administrative Information System (AIS). Electronic Data Systems (EDS) submitted the winning contract, and in 1982 work began on the first phase of the AIS with the development of an online student system using Software AG's ADABAS database management system, PREDICT data dictionary, and NATURAL fourth-generation programming language. The basic core of the student system was implemented in 1985. One welcomed result of this implementation was that more student data were available to more users than ever before. Development of human resources and financial systems by University personnel was started in 1986. This project further reinforced the need for data administration support to aid the user in accessing and understanding data. An off-shoot of these efforts was the creation of an Information Resource Management (IRM) function within Management Services. IRM is made up of the data administration, information center, and security offices. These offices work closely together in the performances of their duties. Data administration organization One of the first problems encountered by most organizations contemplating data administration is how to structure it and where to put it. At Penn State, it was decided that data administration would not be empowered in a single person or organization; rather, all units interacting with the system would share the responsibilities of data administration. Identified below are the key participants in data administration at Penn State and their responsibilities: * Executive Director of Computer and Information Systems The Executive Director of Computer and Information Systems reports directly to the Office of the President and has oversight responsibility for system planning initiatives and policy development that affect data administration. The initiatives are undertaken with the direct involvement of the Committee for Administrative Systems Planning on which key University offices are represented. *

Manager of Data Administration

The Manager of Data Administration -- part of Management Services which reports to the Executive Director of Computer and Information Systems -- is responsible for facilitating overall data systems planning, policy development, research activities, system efficiency, data security, and data accessing. The installation, maintenance, and efficiency of the database and data dictionary systems are also the responsibility of this position. *

Data Stewards

Each data element in the Administrative Information System is assigned to a steward. Typically, stewardship is assigned to the office that has primary responsibility for maintaining the data element. The stewards are responsible for developing coding structures for data, ensuring data accuracy, determining updating frequency, establishing requirements for data protection, and authorizing access to data within their purview. At Penn State, most data elements are under the stewardship of central offices such as the registrar, bursar, admissions, human resources, and the controller. The directors of these offices usually delegate stewardship responsibilities to a senior member of their staff who is involved with the data processing aspects of their area. *

Access and Security Representatives (ASRs)

ASRs have been appointed within major University administrative areas to serve as points of contact for data administration and security issues. ASRs are responsible for requesting access to University administrative systems and data for the employees within their organizations. Once access has been granted, they are also responsible for ensuring appropriate access, use, and protection of the data within their areas. Generally, ASR responsibilities have been assigned to individuals who are familiar with the data processing aspects and information needs of their offices. The office of the Manager of Data Administration coordinates and provides support for the data administration functions of the University. The organizational placement of this function can be critical to the success of an institution's data administration efforts. At Penn State, the Executive Director of Computer and Information Systems established the data administration management function within Management Services. This department reports directly to him, and he reports to the Office of the President. This structure has advantages in permitting data administration to work directly with the operations, process control, and security staffs within Management Services to enforce standards and take immediate action in controlling data access. In addition, the database and data management staffs report directly to the Manager of Data Administration and provide technical expertise for software and systems solutions to many problems. There are four people on the database staff and three people on the data management staff. A potential disadvantage of placing the data administration function in the data processing center could become apparent were problems to occur that affect organizations over which the data administrator has no authority. Generally, these problems would take longer to resolve but could be successfully addressed through the coordination of data administration and the data stewards. In the event a problem could not be resolved, Penn State's data administration

reporting structure permits the escalation of the problem through normal reporting channels to the Executive Director of Computer and Information Systems. The Executive Director might ask the Committee for Administrative Systems Planning for guidance if desired. Ultimately the problem could be elevated to the Office of the President if it were found to be unresolvable at lower levels. User Access to Data The first challenge that faced data administration at Penn State in 1986 was to provide a way for users to request access to computerized institutional data for ad-hoc analysis and reporting. The request procedures had to satisfy several different constituencies: * From the ASR standpoint, a vehicle was needed that would allow them to identify the particular data they wanted to access. * The data stewards wanted information describing describing why the data were needed and how they would be used. The stewards also desired the capability of specifying any restrictions that were to be imposed on the use of the data. * The security office required required identification identification of the individuals individuals who would access the data, as well as a signed statement that the users understood their responsibilities for using data as outlined in University policies and as further specified by the stewards. * Data administration administration needed a way of recording the request and subsequent approvals or disapprovals of everyone involved. In addition, it was very desirable for data administration to keep the process as simple as possible. The initial solution to this problem was the design of a general form for requesting access to computerized institutional data. We decided that a one-page form would reduce confusion on the part of the requestor and aid in the standardization of the request process. One side of the form is completed by the ASR in conjunction with the requesting user from the ASR's area; it is used to identify the data needed, the reasons for the need, and the individuals who will access the data. Each individual is uniquely identified by a "user ID" assigned by the security office. The other side of the form is used to record the signatures of those involved in the approval and access implementation processes. Once the ASRs complete their portion of the form, it is sent to data administration. The data administration staff checks the request for clarity and completeness, and coordinates with the ASR when additional information is needed. Checks are also made to determine if access to the requested data was previously authorized. Data administration then identifies the stewards of the requested data and sends the request to those stewards for their consideration. The average time for the stewards to process a request has been about one week. When all the stewards have taken action on the request, data administration coordinates the implementation of the requesting office's data access through the security office. As a final step, data administration returns a copy of the completed form to the requestor with instructions to aid her or him in accessing the data, and the original form is filed in the data administration area. The form and approval process has worked well with only minor

changes to the request form as dictated by experience. The next step in the process is to implement the form in the electronic approval system that has been developed as part of the business systems portion of AIS. Inclusion of the request form in this system is expected to be complete by spring of 1991. This will eliminate the paper form, speed up the approval process, and provide requestors the capability of monitoring the progress of their requests. Element classification system The request form was not in use very long when another problem presented itself. Requests began to appear asking for access to all data in the student system rather than individual files or fields. In these cases, data administration provided the stewards with listings of all their data elements and requested they mark the ones the user would be allowed to access. For some stewards this meant reviewing listings of up to a thousand data elements. At times, a steward would just finish one review when a request from another user would start the process all over again. Needless to say, the stewards soon asked for a better way to handle access requests. What appeared to be needed was a methodology and system that would allow the stewards to approve access to classes or groupings of data elements rather than individual data elements. We went through several iterations before an acceptable solution was produced. Data administration's first proposal for providing this methodology was for the stewards to use government style classifications such as top secret, secret, and confidential. This proposal was not well received for two reasons: (1) the stewards felt that terms such as top secret and secret did not fit well into the University environment, and (2) no one could decide on a set of criteria for classifying data into these categories. A second proposal was then made which called for only two categories: classified and unclassified. A work sheet was provided to aid in the classification process. The work sheet listed six factors to be considered for each data element: competitive value, fraud potential, legal liability, newsworthiness, financial exposure, and impact on management decisions. This proposal was also rejected. The stewards felt that two classification levels were not enough and the factors on the work sheet were difficult to apply across the board. The stewards accepted a third proposal, which involved classification levels of 0 through 3 and two simple rules: Rule 1: Data elements classified at level 0 are available for access by any authorized user within the University. Rule 2: Classification levels are inclusive of the levels represented by lower level numbers. For example, a user who is given access to level 2 data will also have access to data on levels 1 and 0. Other than level 0, no attempt was made to define the meaning of levels. The stewards were free to create their own criteria for assigning elements to each level. An example of one steward's criteria is as follows: Level 0 is assigned to data elements containing information that is nonsensitive and generally available through public sources such as phone

directories. Level 1 is assigned to elements somewhat sensitive but not specific to a particular college or department. Colleges and departments are usually given access to this level of data. Level 2 is assigned to data specific to a college or department, access to which is usually reserved only for the specific college and for central offices such as the registrar or bursar. Level 3 is assigned to very sensitive data that are generally accessed only by the steward of the data. The classification levels are documented in the data dictionary for each data element. Now when a user requests access to data, the stewards no longer have to review large lists of data elements. They simply specify access to a classification level. As is sometimes the case, the solution to one problem often highlights another problem. The stewards were able to authorize access in record time, but the creation of tailored user views to match those authorizations was a painfully slow manual process. This was made worse by the fact that a given file within the database usually contains elements for many stewards, and therefore many access levels had to be considered in the creation of a user view for the file. The whole process was a very time-consuming burden on the data administration staff. Automated user view system Eliminating the manual process for creating tailored user views thus was the next challenge to be addressed. The data dictionary system provides an online capability for creating user views from file descriptions. However, it is not able to use the steward's element classifications in the process. Half of the solution to this problem was put into place with the documentation of the steward's element classifications in the dictionary. What was needed was a system to automatically create tailored user views by linking element classifications with the level of access authorized to the user by each steward. An existing code table file was used to hold the linking information. A new code set was defined that contains an entry for each unique file, user, and steward combination. This entry contains the level of data access approved by the steward for the user and is maintained by the data management staff. The final piece of the solution was the creation of an online program to read the code set and dictionary and create a user view that is tailored to the approved access for a particular user. As with any system, exceptions do arise. Occasionally a user will specifically request access to elements at a level higher than has been authorized. When this occurs, the stewards have four choices: 1. Authorize the user for the higher level. This usually occurs after the steward investigates the user's needs and gains a more complete understanding of the user requirements. 2. Change the classification level of the elements in question. This sometimes happens when the steward realizes that the original classification level was inappropriate. 3. Deny the request. 4. Grant access to the elements on an exception basis. Choices 1 and 2 require minor changes to dictionary or code-set

entries and then the rerunning of the online user view generation program. Choice 3 requires no action other than notification of the user. Choice 4 requires some additional processing. In this case, the code entry containing the user's access authorization is flagged to indicate an exception exists. The user view generation program then accesses another code set that identifies the data elements to be added as exceptions. Overall, the stewards have done a good job classifying their data elements and the use of the exception process has been rare. During the design of the online user view generator, provisions were made to select an alternate element classification level for sensitive data elements when used in conjunction with entity identifying elements. For example, an element containing grade data may have a classification level of 1 if used alone or with other elements that do not identify a particular entity. This permits studies to be done on grades with no links to entities such as students or colleges. However, if the grade data element is requested along with entity identifying elements such as student identification number or college name, the access level of the grade element can be raised to 2 or 3 since grades can now be linked to a person or organization. The steward has the ability to designate entity identification elements and to specify alternate access levels for any data element. At the present time, the stewards have opted to maintain a simpler system based on a singleelement classification. Data dictionary user enhancements As institutional researchers and other users began accessing University data, they uncovered problems in the documentation of data elements in the dictionary. Typically, the element descriptions were created by individuals who worked closely with the data and were knowledgeable about them. As is often the case, these individuals assumed a similar understanding on the part of others and the documentation was too cryptic or technical for the uninitiated user to understand. This problem was further compounded by the fact that the dictionary did not provide good facilities for the storage and retrieval of the kind of textual information required by the user. The first step in the solution of this problem was to get the users together to develop a list of the kinds of information they felt should be part of the data element documentation. The list they created is as follows: 1. Usage Information This category of information describes how an element is used and interpreted, for example: * * * * *

Descriptions of algorithms algorithms used to calculate calculate element values Non-standard features of the format of of an element Cautions about the use of elements that have known limitations limitations Time dependencies dependencies and order of entry for array elements elements Any special requirements for interpreting the values of an element

2. Value Information * * * *

Legitimate values for an element Default values for the element Indications of what values mean as well as what they do not mean mean The effective dates for specific values

3. Update Information

* * *

How an element is updated When it is updated What office office is responsible for the the update

4. Relationship Data The information in this category describes relationships to other data elements and processes. 5. History Information This documentation lists the date a change is made to an element and describes how the element was affected by the change. A form was designed for the collection of the above information. A separate form was printed for each data element and distributed to the appropriate stewards for use in providing the requested data. A policy was also established requiring the completion of the form for new data elements and for changes to existing elements. This policy is enforced by the data administration staff who are the focal point for data element maintenance. The second part of the solution was the design of a database to contain the new information and the development of the online dictionary access system (DAS) to access and maintain the data. During development, the scope of DAS was expanded to include access to the PREDICT data dictionary as well as a keyword database. The keyword database is created by selecting words from data element names and descriptions and sorting these words to form a cross reference to the data elements. When used through the online system, this cross reference permits the user to select a subject of interest, such as "degree," and view all data elements that contain this subject as a keyword in their element name or description. A generic keyword, such as "deg," can also be entered to allow access to all elements with keywords beginning with the selected characters. All online users have read-access to this system and stewards have read- and update-access. Whenever an update is made by the stewards, the system enforces the creation of a history record to document the reason for the change and the date it was made. The system also provides the stewards with an online capability to view the accesses they have approved through the previously described element classification system. They are able to view these approvals by user department or by database file. Conclusion Over the past four years, the major goal of data administration at Penn State has been to help our users understand and gain access to computerized institutional data in a secure environment. A key element in our success was the early identification of the participants in data administration and the definition of their responsibilities. This enabled us to establish effective lines of communication and gave all concerned an understanding of what was expected of them. In addition, we have endeavored to create solutions to our problems that are easy for people to understand and use. As much as possible,our solutions are online oriented. The data classification, automated user view, and dictionary access systems are good examples of the success of this approach.

Our plans for the future include completing our efforts to make the request form available online, continuing to collect enhanced data element documentation, and including non-database files in the data dictionary system. In addition, the ever-changing world of data processing is sure to present new challenges to be met and solved. ********************************************** ********************** ************************************************* ************************** * For further reading: Brown, Jana, and John O'Connell. "Distributed Use of a Fourth-Generation Language at Arizona State University." CAUSE/EFFECT, Winter 1989, pp. 25-29, 33-35. Howard, Richard D., Gerald W. McLaughlin, and Josetta McLaughlin. "Bridging the Gap between the Database and User in a Distributed Environment." CAUSE/EFFECT, Summer 1989, pp. 19-25. McLaughlin, Gerald W., Deborah J. Teeter, Richard D. Howard, and John S. Schott. "The Influence of Policies on Data Use." CAUSE/EFFECT, January 1987, pp. 6-10. Miselis, Karen L., Lawrence A. Jordan, and Ronald Hoover. "Data Administration: What Is It, Where Is It, How Is It Done?" In Information Technology: Making It All Fit, Proceedings of the 1988 CAUSE National Conference (Boulder, Colo.: 1989), pp. 293-304. The Penn State data administration and security policy documents referenced in this article, as well as similar documents from several other campuses, are listed under the topic "Data Administration" in the CAUSE Exchange Library Guide 1990-1991 and may be ordered by anyone on a CAUSE member campus. ********************************************** ********************** ************************************************* ************************** *

Sponsor Documents

Or use your account on DocShare.tips

Hide

Forgot your password?

Or register your new account on DocShare.tips

Hide

Lost your password? Please enter your email address. You will receive a link to create a new password.

Back to log-in

Close