Wiki Home

Sql Parser

Namespace: Wiki
I am looking for an SQL syntax parser based on open source lex/yacc files for sql.

I have written SQL syntax parsers. I have seen others do it too. None of the ones I have seen work 100%, and even if they did work, they would break as soon as you switched sql implementations and new language enhancements came out.

I am sure the code I want is in the My Sql, Postgre SQL code, and every other SQL engine's source code, I just haven't been able to find anything I can isolate and use in the fashion I want: a utility that just breaks an SQL command into its parts. Doesn't need a UI in some cases I don't even want a UI. In others I want the UI to be tied into some other tool. The UI isn't the hard part anyway.

Here are my goals:

1. Display an SQL command in a tree view so I can expand/collapse parts of it
2. Give it to F1 to upgrade VFE's SqlSelect.ParseSQL Method.
3. Help Rick Schummer integrate it into View Editor.
4. Help Doug Hennig integrate it into Stonefield Query
5. Give it to so that they can have an option to reformat code (both for posting code and even just a public tool where crappy code can be reformatted, like when I pull a command from MS Sql Profiler.)
6. Have a prototype for a well defined VFP ER.
7. Be able to augment a view's sql at runtime, in particular to support xCodes.

I envision a function that takes an SQL command as a string and returns something, like an object or xml stream, with the following structure:

.Sql Command - original command
.Unions - collection of SQLs

each Union will have:
.Fields - collection of field objects
.Tables - collection of table objects
.Where - where expression object
.Having - having expression object
.Group - collection of expressions
.Order - collection of expressions

Expression object
some tree of oExpression operator oExpression
or a constant

fields objects:
.Expression - the source of the value
.FieldName - the destination field name

table objects
.Join - table object and expression object

(I think this is the right way to break down an SQL expression.)

example: f("select a,b from c where d=e") would return:
.fields.item(1) = "a"
.fields.item(2) = "b"
.tables.item(1) = "c"
.where = "d=e"
.wheres.expression(1).type = "comp"
.wheres.expression(1).operator = "="
.wheres.expression(1).operand(1) = "d"
.wheres.expression(1).operand(1) = "e"
From the My Sql mail list:
"The sql_yacc.y is used to generate the sql parser. The parser is used more to "run" the sql rather than to break it into its tokens. Using the lex files would be closer to your stated goals. The file lex.h in the same directory as sql_yacc.y contains all the tokens mysql uses to break apart an SQL statement. is the lexer for mysql, although it is non-trivial code. If your needs are for your editor to fully understand the SQL statement, look into the lex files. If your editor only needs to color code stuff, you could possibly get away with just using the published list of mysql keywords for highlighting."

I took a complier class 12+ years ago, I can only remember enough to know that using lex/yacc with an existing .y file (like sql_yacc.y) is the way to go, but I can't remember how.

Files referenced in the My Sql post:
lex.h - Definitions to use with the lexer
sql_yacc.yy - The yacc parser

intermediate human readable outputs: - The C file someone got by running the parser through bison
sql_yacc.h - The C header file someone got by running the parser through bison

(copies at )

For those of you just browsing, this should give you a taste:

From sql_yacc.yy
  Select : retrieve data from table

	  LEX *lex= Lex;
	  lex->sql_command= SQLCOM_SELECT;
	  select_item_list ',' select_item
	| select_item
	| '*'
	    THD *thd= YYTHD;
	    if (add_item_to_list(thd,
                                 new Item_field(&thd->lex->current_select->
                                                NULL, NULL, "*")))

	  remember_name select_item2 remember_end select_alias
	    if (add_item_to_list(YYTHD, $2))
	    if ($4.str)
	      $2->set_name($4.str, $4.length, system_charset_info);
              $2->is_autogenerated_name= FALSE;
	    else if (!$2->name) {
	      char *str = $1;
	      if (str[-1] == '`')
	      $2->set_name(str,(uint) ($3 - str), YYTHD->charset());

Q: What language is lex.h sql_yacc.h sql_yacc.yy ?
A: "What Lex & YACC can do for you - When properly used, these programs allow you to parse complex languages with ease. This is a great boon when you want to read a configuration file, or want to write a compiler for any language you (or anyone else) might have invented." - "never write a parser again by hand - Lex & YACC are the tools to do this." -

I think this is the next step in the road to Parser Happiness:,guid,9a28ad22-b41b-434a-a540-40e6197a099e.aspx

Maybe not. I think that is one step short. whatever that means.
See Also
Nest Parser - similar concept, should give a better understanding of what I am looking for. If I get Sql Parser working, I would rework Nest Parser using the same technology.

some lex/yacc links:
Lex and YACC primer/HOWTO -
lex and yacc from unixODBC
Contributors: Carl Karsten
( Topic last updated: 2006.05.04 09:58:15 PM )