Spreadsheet::ParseExcel::Workbook
methodsSpreadsheet::ParseExcel::Cell
methods
Spreadsheet::ParseExcel::Formula - extension of Spreadsheet::ParseExcel to handle parsing and evaluation of Excel formulas
NOTE: Please read the section LIMITATIONS before using this module to make sure it suits your purpose!
# use with or without SaveParser extension use Spreadsheet::ParseExcel; use Spreadsheet::ParseExcel::SaveParser::Workbook; use Spreadsheet::ParseExcel::Formula;
# load and parse Excel file including formulas my $xls = Spreadsheet::ParseExcel::SaveParser::Workbook->Parse('test.xls');
# set formula evaluation iteration limit and/or epsilon # (optional; only needed for self-referential formula structures) $xls->set_iteration_limit(10); # default: 10 $xls->set_epsilon(1e-6); # default: 1e-6
# set and change cell values as you like (optional) # this sets cell A1 of the first worksheet to the numerical value 17 $xls->{Worksheet}->[0]->{Cells}->[0]->[0]->{Val} = 17;
# evaluate the formulas in the Excel workbook $xls->evaluate();
# retrieve and print formula cell results by accessing the "Val" member of # a Cell object (it is assumed that cell A2 contains a formula referencing # cell A1) print 'Cell A2 value: ', $xls->{Worksheet}->[0]->{Cells}->[0]->[1]->{Val}, "\n";
# save the workbook to a new excel file using SaveParser's SaveAs method # Note: currently formulas are not saved. $xls->SaveAs('test1.xls');
You have already read the section LIMITATIONS, haven't you?
Spreadsheet::ParseExcel::Formula
can be used to enable formula parsing and
evaluation in Excel 2003/97/XP files. The internal binary representation of
Excel formulas (see INTERNALS) is parsed on parsing the excel file with the
Parse
methods of either Spreadsheet::ParseExcel::Workbook
or
Spreadsheet::ParseExcel::SaveParser::Workbook
.
This is achieved by extending the Spreadsheet::ParseExcel::Workbook
and
Spreadsheet::ParseExcel::Cell
classes only, therefore this piece of code
may be considered a pseudo-module, as it neither implements a classs, nor
implements or uses the namespace of Spreadsheet::ParseExcel::Formula
.
There are currently strict limitations on the number and use of Excel functions and formula syntax implemented (see LIMITATIONS), but you are encouraged to extend and improve the functions and syntax recognized.
Spreadsheet::ParseExcel::Workbook
methods(In the following, $xls
denotes a valid
Spreadsheet::ParseExcel::Workbook
object).
$xls->evaluate()
Evaluates all formulas within the workbook object until either the current iteration limit is exceeded, or the global error of all formula cells is less then the current epsilon limit.
Returns false (undef
) if the iteration limit has been exceeded, or true
(1
) if the workbook evaluated successfully.
NOTE: A true return value does not necessarily indicate, that your workbook/worksheet is free of cell errors. As already explained, cell errors are handled as strings and compared as such, meaning that if this string error values compare equal on successive iterations, the cell is considered stable and evaluation has been successful. OTOH, if false is returned, this does not necessarily mean that your workbook/worksheet evaluated erroneously, since this depends on the functions and self-referential formula structures used within the workbook/worksheet.
In general, evaluate()
is the only workbook method you need for formula
evaluation. You do not need any of the methods described below, unless you
have self-referential formula structures within your Excel file, and want
fine-grained control over formula evaluation.
$xls->get_iteration_limit()
Retrieves the current iteration limit (default: 10) for formula evaluation. Returns the current iteration limit (scalar, number).
$xls->set_iteration_limit($num)
Sets the current iteration limit for formula evaluation to $num
.
Returns the new iteration limit (scalar, number).
$xls->get_epsilon()
Retrieves the current epsilon limit (default: 1e-6) for formula evaluation. Returns the current epsilon limit (scalar, number).
$xls->set_epsilon($num)
Sets the current epsilon limit for formula evaluation to $num
.
Returns the new epsilon limit (scalar, number).
Spreadsheet::ParseExcel::Cell
methods(In the following, $cell
denotes a valid
Spreadsheet::ParseExcel::Cell
object).
$cell->evaluate()
Evaluates a single cell containing a formula, sets the cell value to and returns the evaluation result.
Note that this method should not be directly invoked, as this is done by the
evaluate()
method of the Spreadsheet::ParseExcel::Workbook
class. The
only meaningful purpose is when evaluation of a whole workbook is too
time-consuming, and evaluation of a single formula cell is sufficient for a
particular type of application.
This is in addition to the section LIMITATIONS, which you should have definitely read by now!
This module hooks itself into the parsing process of
Spreadsheet::ParseExcel
, and parses the binary formula string of Excel into
a RPN (Reverse Polish Notation, see
http://en.wikipedia.org/wiki/Reverse_Polish_Notation) parse sequence
(basically a Perl array of formula tokens).
During evaluation, this RPN parse sequence of the formula is interpreted for each formula cell facilitated by a stack machine (see http://en.wikipedia.org/wiki/Stack_machine), where each token or formula function consumes a number of arguments from the stack, and pushes its result back onto the stack. The final result of a cell is then the top of stack, which should then contain only this one last entry.
Since in Excel self-referential (see http://en.wikipedia.org/wiki/Self-referential) formulas are allowed, a worksheet/workbook needs to be iteratively (see http://en.wikipedia.org/wiki/Iteration) evaluated, until all values (hopefully) stabilize onto a final formula result.
The question as what ``stabilize'' means is answered by an epsilon range (see http://en.wikipedia.org/wiki/Limit_(mathematics)), against which the difference of the current and previous values of a cell are compared. If the absolute value of this difference is smaller than this epsilon, the cell is considered ``stable'', otherwise the evaluation process needs another iteration.
Since there are cases,
where a self-referential formula complex may not stabilize onto a final value
(e.g. when a RAND()
function is involved), a limit needs to be placed on the
maximum number of iterations.
Both epsilon and the iteration limit may be queried and set using corresponding accessors (see DESCRIPTION).
Only Excel 2003/97/XP formulas are parsed correctly (this is the so-called BIFF8 format). Trying to parse files produced with other versions may in the best case produce erroneous and unpredictable results.
Only a small but useful subset of possible formula syntax is implemented. Currently unimplemented formula features and constructs include:
Array constants such as {1, 2}
.
Cell range intersections (the space
operator).
Cell range lists/unions (the comma
operator).
Defined names (variables), i.e. named cells or cell ranges.
Cell ranges using defined names (the colon
operator with defined names),
e.g. namedcell:B2
. NOTE: Not to be confused with regular cell ranges like
A1:B2
; these are implemented and should work as expected.
All types of reference subexpressions (constant, reference, deleted, incomplete, etc.) used for encapsulation of the cell range and list operators.
3D cell references and 3D cell range references, i.e. cross-worksheet
references of the form "OtherWorksheet"!A1
. This means the formulas may
only reference cells within the same worksheet.
All types of deleted cell references (2D, 3D, relative, etc.), as these indicate an erroneous formula. It is assumed, that the worksheet to be evaluated is debugged and works correctly within Excel itself.
Matrix formulas.
Multiple operation tables.
Natural language references.
The CHOOSE
function control.
Assignment in macro sheets.
Only a small but useful subset (about one third) of possible functions useable in formulas is implemented. Currently implemented functions are:
COUNT, IF, ISNA, ISERROR, SUM, AVERAGE, MIN, MAX, NA, DOLLAR, FIXED, SIN,
COS, TAN, ATAN, PI, SQRT, EXP, LN, LOG10, ABS, INT, SIGN, ROUND, REPT, MID,
LEN, VALUE, TRUE, FALSE, AND, OR, NOT, MOD, VAR, RAND, ATAN2, ASIN, ACOS, LOG,
CHAR, LOWER, UPPER, PROPER, LEFT, RIGHT, EXACT, TRIM, REPLACE, SUBSTITUTE,
CODE, FIND, ISERR, ISTEXT, ISNUMBER, ISBLANK, T, N, CLEAN, TRUNC, USDOLLAR,
ROUNDUP, ROUNDDOWN, MEDIAN, SUMPRODUCT, SINH, COSH, TANH, ASINH, ACOSH, ATANH,
EVEN, FLOOR, CEILING, ODD, CONCATENATE, POWER, RADIANS, DEGREES, SUMIF,
COUNTIF
Boolean values are encoded as integers 0 and 1 as in Perl.
There is no such thing as an error type or object. Errors are implemented as
simple strings beginning with #
and ending with !
, like e.g. '#N/A!'
.
All this means that even those formulas are implemented, you might get different, if not completely erroneous results out of evaluating your particular Excel files, especially if calculations on edge cases of a particular function are involved, or the evaluation of a particular nested function results in an error. YMMV, you have been warned!
A lot! You are encouraged to help improving and extending formula evaluation
within Spreadsheet::ParseExcel
!
Syntactical improvements: Parsing and evaluating currently unrecognized tokens such as constant arrays or 3D cell references.
Functional improvements: Extend the number of implemented functions, especially with Date&Time, and statistical functions.
Extensive testing: Write comprehensive test cases for testing all edge cases and boundary conditions of the implmemented functions, and improve error handling on formula evaluation errors.
Wishlist 1: Enable formula modification within Perl. This involves parsing the ASCII representation of Excel formulas, possibly in all languages supported by Excel, and storing it back in the internal RPN parse sequence.
Wishlist 2: Enable Spreadsheet::WriteExcel
to write back the internally
stored RPN parse sequence of formulas into the resulting Excel file. This
involves reverting the process of binary token parsing, i.e. converting the
RPN parse sequence into its corresponing binary representation.
Franz Fasching (franz dot fasching at gmail dot com).
Copyright (c) 2008 Franz Fasching.
All Rights Reserved. This module is free software. It may be used, redistributed and/or modified under the same terms as Perl itself as specified in the Perl README file, i.e. the ``Artistic License'' or the ``GNU General Public License (GPL)''.
The Spreadsheet::ParseExcel
, Spreadsheet::ParseExcel::SaveParser
, and
Spreadsheet::WriteExcel
modules.
OpenOffice.org
has made the specification of the Excel file format publicly
available (see http://sc.openoffice.org/excelfileformat.pdf), which
has recently been made available also by Microsoft.
Kawai Takanori, and Gabor Szabo for their impressive
Spreadsheet::ParseExcel
module.
John McNamara for his excellent Spreadsheet::WriteExcel
module.
Dr. Claus Fischer (TXware GmbH), who enabled me to write this module as part of a client project, and make it publicly available under the PERL Artistic License and the GPL.