Parser

Published on February 2017 | Categories: Documents | Downloads: 28 | Comments: 0 | Views: 224
of 17
Download PDF   Embed   Report

Comments

Content

PARSER

Introduction
What is parser ? A parser is a program that breaks data into smaller elements, according to a set of rules that describe its structure. Most data can be decomposed to some degree. For example, a phone number consists of an area_code, prefix and suffix. For example, (800) 555 (123) 555 (999) 888

-

1234 4321 7777 suffix

(area_code)

prefix

While the text of a program is easy to understand by humans, the computer must convert it into a form which it can understand before any emulation or compilation can begin. This process is known generally as “parsing” and consists of two distinct parts. The first part is the “tokenizer” – also called a laxer or scanner. The tokenizer takes the source text and breaks it into the reserved words, constants, identifiers, and symbols that are defined in the language. These tokens are subsequently passed to the “actual parser” which analyzes the series of tokens and then determines when one of the language’s syntax rules is complete.

As these completed rules are “reduced” by the parser, a tree following the language’s grammar and representing the program is created. In this form, the program is ready to be interpreted or compiled by the application.
Modern bottom-up parsers use a deterministic-Finite Automaton (DFA) to implement the tokenizer and a LALR(1) state machine to parse the created tokens. Practically all common parser generators, such as the UNIX standard YACC, use these algorithms. The actual LALR(1) and DFA algorithm are easy to implement since they rely on tables to determine actions and state transition. Consequently, it is the computing or these tables that is both time-consuming and complex.

The GOLD Parser Builder performs this task. Information is read from a source grammar and the appropriate tables are computed. These tables are then saved to a file which can be, subsequently, loaded by the actual parser engine and used.

In this project, we are proceeding to make the laxer. For this we need to encode the sanskrit akshras. For this we use ICU library which contains unicode UTF-8 encoding header files, compatible with Visual Studio Framework.
Hence, our working platform is Visual C++ using the framework of Visual Studio 20xx. The encoding of Sanskrit akshras has been shown in the following slides as per UTF-8 format. The table is showing the akshras with their respective hexadecimal code points, ranging from U+0900 to U+097F.

Sponsor Documents

Or use your account on DocShare.tips

Hide

Forgot your password?

Or register your new account on DocShare.tips

Hide

Lost your password? Please enter your email address. You will receive a link to create a new password.

Back to log-in

Close