Bug #131336 for Validate-Simple: kinda slow

Thu Jan 02 07:34:11 2020 perl [...] toby.ink - Ticket created

Subject:

kinda slow

I used the attached script to compare the speed of Validate::Simple with Type::Tiny. My results were that Type::Tiny is about 37 times faster. Rate Validate_Simple Types_Standard Validate_Simple 4.00/s -- -97% Types_Standard 154/s 3748% -- Type::Tiny/Types::Standard is also a lot more concise to express the schema: my $tt = Dict[ username => Str, first_name => Str, last_name => Optional[Str], age => Optional[IntRange[18]], gender => Optional[ Enum[qw/ mail femaile id_rather_not_to_say /] ], tags => Optional[ArrayRef[Str]], hobbies => Optional[ ArrayRef[ Enum[qw/ hiking travelling surfing laziness /] ] ], score => Optional[HashRef[PositiveOrZeroInt]], monthly_score => Optional[ HashRef[ HashRef[ ArrayRef->where('@$_ < 12') ] ] ] ]; The main advantage that Validate::Simple seems to offer is that Type::Tiny is geared towards giving a single result about whether the supplied data is valid or not as a whole, while Validate::Simple will drill down into the data to figure out why it's not valid, which might be more useful when, say, reporting errors from a form someone has filled out. If there were a simple way to build a fast Type::Tiny object from Validate::Simple (or vice versa, in fact) then people could use the fast Type::Tiny check to check whether a data is valid or not, then fall back to Validate::Simple to report the errors if there are any. Of course, in many situations, validation speed is not an issue, like if you're validating a smallish JSON structure you've been posted via HTTP, then the network and HTTP stuff is probably your app's bottleneck, and not data validation.

Subject:

vs.pl

use strict; use warnings; use Validate::Simple; my $specs = { username => { type => 'string', required => 1, }, first_name => { type => 'string', required => 1, }, last_name => { type => 'string', }, age => { type => 'integer', gt => 18, }, gender => { type => 'enum', values => [ 'mail', 'femaile', 'id_rather_not_to_say', ], }, tags => { type => 'array', of => { type => 'string', }, }, hobbies => { type => 'array', of => { type =>'enum', values => [ qw/hiking travelling surfing laziness/ ], } }, score => { type => 'hash', of => { type => 'non_negative_int' }, }, monthly_score => { type => 'hash', of => { type => 'hash', of => { type => 'array', of => { type => 'integer' }, callback => sub { @{ $_[0] } < 12; }, } }, } }; my $vs = Validate::Simple->new( $specs ); use Types::Standard -types; use Types::Common::Numeric -types; use Types::Common::String -types; my $tt = Dict[ username => Str, first_name => Str, last_name => Optional[Str], age => Optional[IntRange[18]], gender => Optional[Enum[qw/ mail femaile id_rather_not_to_say /]], tags => Optional[ArrayRef[Str]], hobbies => Optional[ArrayRef[Enum[qw/ hiking travelling surfing laziness /]]], score => Optional[HashRef[PositiveOrZeroInt]], monthly_score => Optional[HashRef[HashRef[ArrayRef->where('@$_ < 12')]]] ]; use Test::More; my $data = { username => 'alice', first_name => 'Alice', last_name => 'Jones', age => 21, tags => [qw/ abc xyz /], hobbies => ['hiking'], score => { hiking => 45 }, monthly_score => { hiking => { '2020' => [ 8 ] } }, }; ok( $vs->validate($data) ) or diag explain( [$vs->errors] ); ok( $tt->check($data) ); use Benchmark qw(cmpthese); cmpthese -3, { Validate_Simple => sub { $vs->validate($data) for 1..1000 }, Types_Standard => sub { $tt->check($data) for 1..1000 }, }; done_testing;

Thu Jan 02 22:43:00 2020 ANDREIP [...] cpan.org - Correspondence added

On Thu Jan 02 07:34:11 2020, TOBYINK wrote: Show quoted text

> I used the attached script to compare the speed of Validate::Simple > with Type::Tiny. My results were that Type::Tiny is about 37 times > faster. > > Rate Validate_Simple Types_Standard > Validate_Simple 4.00/s -- -97% > Types_Standard 154/s 3748% -- > > > > Type::Tiny/Types::Standard is also a lot more concise to express the > schema: > > my $tt = Dict[ > username => Str, > first_name => Str, > last_name => Optional[Str], > age => Optional[IntRange[18]], > gender => Optional[ > Enum[qw/ mail femaile id_rather_not_to_say /] > ], > tags => Optional[ArrayRef[Str]], > hobbies => Optional[ > ArrayRef[ > Enum[qw/ hiking travelling surfing laziness /] > ] > ], > score => Optional[HashRef[PositiveOrZeroInt]], > monthly_score => Optional[ > HashRef[ > HashRef[ > ArrayRef->where('@$_ < 12') > ] > ] > ] > ]; > > > > The main advantage that Validate::Simple seems to offer is that > Type::Tiny is geared towards giving a single result about whether the > supplied data is valid or not as a whole, while Validate::Simple will > drill down into the data to figure out why it's not valid, which might > be more useful when, say, reporting errors from a form someone has > filled out. > > If there were a simple way to build a fast Type::Tiny object from > Validate::Simple (or vice versa, in fact) then people could use the > fast Type::Tiny check to check whether a data is valid or not, then > fall back to Validate::Simple to report the errors if there are any. > > Of course, in many situations, validation speed is not an issue, like > if you're validating a smallish JSON structure you've been posted via > HTTP, then the network and HTTP stuff is probably your app's > bottleneck, and not data validation.

Well, it is slow, because it uses Data::Types under the hood, which, in turn, uses regular expressions to validate numbers. I briefly looked inside Type::Tiny, and as far as I understand, it uses either XS, or some Moose magic. No surprise, VS is slower. I want it to be faster, but as you mentioned, validation is not a bottleneck (yet?), so it is not a priority. Let me tell you how this module appeared. I work on some REST API, and I need to validate input JSON, or query string params. At the beginning it was quite straight forward, and I started with a bunch of functions to validate single values. Then I realized, that client developers work much faster (and ask much less questions), if they have a specification of expected structure in the response, so for each object I created a spec. The specs can be retrieved from the API by providing '_desc' as an object ID. Of course, they were JSON-like structured. While adding new requests and/or params, I noticed, that I can actually reuse the specs for client side developers to unify validation. This is how the very first versions of VS appeared. Then I started to report all issues in the request, and client side devs really loved that. On the other hand, the objects returned by API also required some validation and modification. The very common issue was to return booleans. Another issue was about proper numeric values. Some clients can be too strict and will make a difference between '{"number":"10"}' and '{"number":10}'. I had to make sure that in resulting JSON certain fields had 'true|false' values, and numbers were actual numbers, not strings with digits. And, of course, all required fields were there, and no unwanted fields. So I have another module which uses similar specifications to validate output and cast values to proper types. Specs for responses have 'boolean' type and have a 'description' field. They also may have custom types. For example, if one needs to describe a team of devs, the structure would look like (I'm omitting verbose spec per field): { id => 'positive_int', name => 'string', team_lead_id => 'positive_int', team_lead_name => 'string', devs => { type => 'array', of => 'developer', } } and then 'developer' has its own spec: { id => 'positive_int', first_name => 'string', last_name => 'string', skills => { type => 'array', of => 'skill', } } Skills, in turn, can be either 'enum', or another complex object with id, name, score, etc. This also explains, why the specs are so ugly^Wverbose. I extracted only the simplest part of validation to the VS module. I am going to add a bunch of features to it and then replace with it the mess I have in my API. First off, I want to simplify specs. Like, instead of "{ id => { type => 'positive_int' } }" it could simply be "{ id => 'positive_int' }", and instead of "{ skills => { type => 'array', of => 'string' } }" it would be "{ skills => 'array<string>' }". Second, I want to add type 'object'. It's like a 'Dict' in T:T. At the moment it's possible only by doing something like that: "{ object => { type => 'hash', of => 'any', callback => $coderef } }". I also want to have an ability to add types. So, about the speed and transforming specs into Type::Tiny, or vice versa. In order to do that both structures must be isomorphic. With some callbacks, I believe, it's possible. Not sure, it will be fast enough. On the other hand, now I am thinking about replacing Data::Types with Type::Tiny to validate primitive types as an intermediate step, before I have my own XS. :)

Thu Jan 02 22:43:00 2020 The RT System itself - Status changed from 'new' to 'open'

Tue Jan 07 15:48:12 2020 perl [...] toby.ink - Correspondence added

Type::Tiny doesn't require XS; its only non-core dependency is Exporter::Tiny (which used to be bundled, but got split out). It will use Type::Tiny::XS if it's installed, and this can make some type checks faster. It doesn't use Moose either, though Moose can use it.

Mon Apr 13 02:34:36 2020 ANDREIP [...] cpan.org - Correspondence added

It's much faster now. Will be even faster, when I replace Data::Types (regexp-based) with something XS-based.

Mon Apr 13 02:34:37 2020 ANDREIP [...] cpan.org - Status changed from 'open' to 'resolved'

Mon Apr 13 02:34:38 2020 ANDREIP [...] cpan.org - Fixed in v0.3.0 added