NAME

Spreadsheet::ParseExcel::Formula - extension of Spreadsheet::ParseExcel to handle parsing and evaluation of Excel formulas


SYNOPSIS

NOTE: Please read the section LIMITATIONS before using this module to make sure it suits your purpose!

    # use with or without SaveParser extension
    use Spreadsheet::ParseExcel;
    use Spreadsheet::ParseExcel::SaveParser::Workbook;
    use Spreadsheet::ParseExcel::Formula;
    # load and parse Excel file including formulas
    my $xls = Spreadsheet::ParseExcel::SaveParser::Workbook->Parse('test.xls');
    # set formula evaluation iteration limit and/or epsilon
    # (optional; only needed for self-referential formula structures)
    $xls->set_iteration_limit(10);      # default: 10
    $xls->set_epsilon(1e-6);            # default: 1e-6
    # set and change cell values as you like (optional)
    # this sets cell A1 of the first worksheet to the numerical value 17
    $xls->{Worksheet}->[0]->{Cells}->[0]->[0]->{Val} = 17;
    # evaluate the formulas in the Excel workbook
    $xls->evaluate();
    # retrieve and print formula cell results by accessing the "Val" member of
    # a Cell object (it is assumed that cell A2 contains a formula referencing
    # cell A1)
    print 'Cell A2 value: ',
          $xls->{Worksheet}->[0]->{Cells}->[0]->[1]->{Val}, "\n";
    # save the workbook to a new excel file using SaveParser's SaveAs method
    # Note: currently formulas are not saved.
    $xls->SaveAs('test1.xls');


DESCRIPTION

You have already read the section LIMITATIONS, haven't you?

Spreadsheet::ParseExcel::Formula can be used to enable formula parsing and evaluation in Excel 2003/97/XP files. The internal binary representation of Excel formulas (see INTERNALS) is parsed on parsing the excel file with the Parse methods of either Spreadsheet::ParseExcel::Workbook or Spreadsheet::ParseExcel::SaveParser::Workbook.

This is achieved by extending the Spreadsheet::ParseExcel::Workbook and Spreadsheet::ParseExcel::Cell classes only, therefore this piece of code may be considered a pseudo-module, as it neither implements a classs, nor implements or uses the namespace of Spreadsheet::ParseExcel::Formula.

There are currently strict limitations on the number and use of Excel functions and formula syntax implemented (see LIMITATIONS), but you are encouraged to extend and improve the functions and syntax recognized.

Additional Spreadsheet::ParseExcel::Workbook methods

(In the following, $xls denotes a valid Spreadsheet::ParseExcel::Workbook object).

$xls->evaluate()

Evaluates all formulas within the workbook object until either the current iteration limit is exceeded, or the global error of all formula cells is less then the current epsilon limit.

Returns false (undef) if the iteration limit has been exceeded, or true (1) if the workbook evaluated successfully.

NOTE: A true return value does not necessarily indicate, that your workbook/worksheet is free of cell errors. As already explained, cell errors are handled as strings and compared as such, meaning that if this string error values compare equal on successive iterations, the cell is considered stable and evaluation has been successful. OTOH, if false is returned, this does not necessarily mean that your workbook/worksheet evaluated erroneously, since this depends on the functions and self-referential formula structures used within the workbook/worksheet.

In general, evaluate() is the only workbook method you need for formula evaluation. You do not need any of the methods described below, unless you have self-referential formula structures within your Excel file, and want fine-grained control over formula evaluation.

$xls->get_iteration_limit()

Retrieves the current iteration limit (default: 10) for formula evaluation. Returns the current iteration limit (scalar, number).

$xls->set_iteration_limit($num)

Sets the current iteration limit for formula evaluation to $num. Returns the new iteration limit (scalar, number).

$xls->get_epsilon()

Retrieves the current epsilon limit (default: 1e-6) for formula evaluation. Returns the current epsilon limit (scalar, number).

$xls->set_epsilon($num)

Sets the current epsilon limit for formula evaluation to $num. Returns the new epsilon limit (scalar, number).

Additional Spreadsheet::ParseExcel::Cell methods

(In the following, $cell denotes a valid Spreadsheet::ParseExcel::Cell object).

$cell->evaluate()

Evaluates a single cell containing a formula, sets the cell value to and returns the evaluation result.

Note that this method should not be directly invoked, as this is done by the evaluate() method of the Spreadsheet::ParseExcel::Workbook class. The only meaningful purpose is when evaluation of a whole workbook is too time-consuming, and evaluation of a single formula cell is sufficient for a particular type of application.


INTERNALS

This is in addition to the section LIMITATIONS, which you should have definitely read by now!

This module hooks itself into the parsing process of Spreadsheet::ParseExcel, and parses the binary formula string of Excel into a RPN (Reverse Polish Notation, see http://en.wikipedia.org/wiki/Reverse_Polish_Notation) parse sequence (basically a Perl array of formula tokens).

During evaluation, this RPN parse sequence of the formula is interpreted for each formula cell facilitated by a stack machine (see http://en.wikipedia.org/wiki/Stack_machine), where each token or formula function consumes a number of arguments from the stack, and pushes its result back onto the stack. The final result of a cell is then the top of stack, which should then contain only this one last entry.

Since in Excel self-referential (see http://en.wikipedia.org/wiki/Self-referential) formulas are allowed, a worksheet/workbook needs to be iteratively (see http://en.wikipedia.org/wiki/Iteration) evaluated, until all values (hopefully) stabilize onto a final formula result.

The question as what ``stabilize'' means is answered by an epsilon range (see http://en.wikipedia.org/wiki/Limit_(mathematics)), against which the difference of the current and previous values of a cell are compared. If the absolute value of this difference is smaller than this epsilon, the cell is considered ``stable'', otherwise the evaluation process needs another iteration.

Since there are cases, where a self-referential formula complex may not stabilize onto a final value (e.g. when a RAND() function is involved), a limit needs to be placed on the maximum number of iterations.

Both epsilon and the iteration limit may be queried and set using corresponding accessors (see DESCRIPTION).


LIMITATIONS


TODO


AUTHOR

Franz Fasching (franz dot fasching at gmail dot com).


COPYRIGHT

Copyright (c) 2008 Franz Fasching.

All Rights Reserved. This module is free software. It may be used, redistributed and/or modified under the same terms as Perl itself as specified in the Perl README file, i.e. the ``Artistic License'' or the ``GNU General Public License (GPL)''.


SEE ALSO

The Spreadsheet::ParseExcel, Spreadsheet::ParseExcel::SaveParser, and Spreadsheet::WriteExcel modules.

OpenOffice.org has made the specification of the Excel file format publicly available (see http://sc.openoffice.org/excelfileformat.pdf), which has recently been made available also by Microsoft.


ACKNOWLEDGEMENTS