On Tue Dec 02 05:11:20 2014, TLINDEN wrote:
Show quoted text> I compared the code with what the documentation states. It doesn't
> correspond. According to the documentation the module is about
> automating tasks across servers using ssh and stuff like this.
>
> In reality the module does, among other things:
>
> - requests root permissions here and there (see RT#97421)
> - installs and removes software (local and remote) like apache, mysql
> and so forth
> - maintains amazon ec2 and aws instances
> - installs and removes perl modules
> - creates and deletes files and directories in remote and local
> directories
> - calls tar, transfers tarballs, unpacks and removes them on various
> occurences in the code
> - creates/modifies crontab entries
> - lot of more stuff I didn't have the time even trying to understand
>
> While the documentation talks about security, the module nevertheless
> stores passwords to berkeley db files. Yes, those are encrypted, but
> what about the encryption key? It's stored elsewhere, since this
> thing's main purpose is to run from cron. Obviously the author hasn't
> ever heard of ssh keys.
>
> A couple of notes about the code: it's a bloated mess in one 1.4 mb
> file, FA_Core.pm. There are 174 subs, many of them are duplicates
> (e.g. there are 4 subs "cwd"). There are 36 packages within
> FA_Core.pm, most of them declared inside subs. About 20-30% of the
> code is redundant, i.e. duplicated all over the module. The file is
> soo large, that it is impossible to understand for an outsider. It
> uses hundrets of global variables for everything. While it uses lots
> of external modules it also does many things on its own (in weird
> ways, e.g. handling of times and dates), eg:
>
> %hourconv=('12:00am'=>0,' 1:00am'=>1,' 2:00am'=>2,' 3:00am'=>3, [..];
> [..]
> $hourstring=$hourconv{unpack('x6 a*',$output->[4]->[0])};
>
> Also there are hardcoded mime-types, path's, programs, etc.
>
> The module should - if any - be rewritten from scratch. In its current
> state it can't be trusted.
I will address each item one at a time:
Show quoted text>> - requests root permissions here and there
Most of this type of code has been moved out and put in different modules that are specifically written for setting up NEW servers launched by FullAuto via the AWS EC2 API. When setting up a new server with software, needing root level privileges is unavoidable. Users need to provide credentials that have Administrator privileges activated for any of this to work on AWS EC2. There is no way that any of this code can be successfully used without the full knowledge and buy-in of those using it.
Show quoted text>> - installs and removes software (local and remote) like apache, mysql
and so forth
That code is in a separate module called an "Instruction Set". It's name is Liferay_is.pm and it is a module that "instructs" FullAuto in launching 3 servers via the EC2 AWS API and then provisioning those servers in turn, and finally cross configuring them, and starting up the entire architecture. This is an automated standup of a complete environment - NOT FROM PRE-BUILT IMAGES, but from source code and repositories. That's what FullAuto does, and it does it via persistent SSH and SFTP connections to these servers. Right now there are 5 "demo" "instructions sets" bundled with the distribution. Later these will be separated from the core bundle, and released independently.
There is also an instruction set that sets up a four node Hadoop cluster.
There is one that sets up a full working Catalyst server that will soon serve as a platform for coupling a RESTful API to FullAuto.
I fail to see how any of this can viewed as a "problem". FullAuto was built to automate workloads like Chef, Puppet, Salt, Ansible and others do, with a LOT less overhead and more flexibility and ease of use.
Show quoted text>> - maintains amazon ec2 and aws instances
NO - it does not. The demo "instruction sets" do launch from generic images, bu t there is no image "maintenance". All image manipulation is done from code in the instruction set - NOT the core engine.
Show quoted text>> - creates and deletes files and directories in remote and local directories
Again - this is code found in the "instruction sets". Manipulating files is a common workload automation task. Again, I fail to see how this can be an issue?
Show quoted text>> - calls tar, transfers tarballs, unpacks and removes them on various
occurences in the code
Again - this code is now mostly found in instruction sets. However, there is code in the core that does dynamic mirroring of directories using a combination of sftp and ssh. It does a diff via "ls" output and bundles just the files that have changed, and does use tar to move those in order speed up the transfer. This functionality is currently undocumented, but will be documented soon. Again I fail to see how this is a "problem".
Show quoted text>> creates/modifies crontab entries
Again, an undocumented and not completely implemented feature. Not likely to be elevated anytime soon.
Show quoted text>> - lot of more stuff I didn't have the time even trying to understand
What "standard" are you suggesting? That it can be completely digested in 10 minutes, 10 hours or 10 days? The code is ALL there. Nothing is hidden. There is a LOT of it, but most of the superfluous stuff has been removed. What remains actually works and is a core feature - or is close to being one.
Show quoted text>> While the documentation talks about security, the module nevertheless
> stores passwords to berkeley db files. Yes, those are encrypted, but
> what about the encryption key? It's stored elsewhere, since this
> thing's main purpose is to run from cron. Obviously the author hasn't
> ever heard of ssh keys.
Nonsense - of course I've heard of ssh keys, and the updated POD documentation strongly advises their use. Nevertheless, passwords are still prevalent, and if that is all that is available, then they can be used. And yes, the berkeley db treatment is hardly a "perfect" solution, but it beats storing passwords in cleartext. It's a compromise - and not even a very good one, but again, better than cleartext. I strongly urge the use of keys and identity files - and do frequently in the documentation. Using keys and identity files is supported and easy to use with FullAuto, and is documented, whereas the berkeley db stuff currently isn't.
Show quoted text>> There are 174 subs, many of them are duplicates
> (e.g. there are 4 subs "cwd").
Show quoted text>> Those "cwd"s are in different packages, and serve different purposes, but all designed to give the user a seamless and singular interface experience. The one in the Remote_CMD package deals specifically with SSH and TELNET sessions. The one in File_Transfer deals with SFTP and FTP sessions. The one in MAIN deals with localhost and one in "Broken" allows for control over error conditions. None of it is "cruft".
Show quoted text>> There are 36 packages within FA_Core.pm, most of them declared inside subs.
Those are subs designed to work with Term::Menus and declaring a unique package within them prevents namespace collisions which was a big problem for me early on. These routines are subject to a form of code injection using Data::Dump::Streamer - which can see inside of closures. Nothing else at the time could that I was aware of. These subroutines need to be "persisted" in memory for dynamic menu creation.
Show quoted text>> About 20-30% of the
> code is redundant, i.e. duplicated all over the module.
I am guilty of some of this - and have been working to reduce a lot of this as time and opportunity permits. But it's not nearly as bad as you suggest.
Show quoted text>> The file is soo large, that it is impossible to understand for an outsider.
Not "impossible" - that's absurd. Difficult? Yes - I admit that. And I can appreciate that you lack the incentive to do more than a cursory examination. However, I have NO doubt, that if I've built something compelling that someone, and perhaps many, WILL take the time to tear it down and learn what it does and how it does it. The proof as they say - is in the pudding.
Show quoted text>> The module should - if any - be rewritten from scratch. In its current
state it can't be trusted.
Nonsense. Could it benefit from "refactoring" - yes of course, and that is an ongoing activity. The thing works - it works well, it performs well, it reports errors extremely well. I would not call that something that needs to be "rewritten from scratch".
And as far as it being "trustworthy" - that is a call everyone is entitled to make on their own.
FullAuto is AMBITIOUS and highly unique software, and no amount of refactoring is going magically turn it into something an outsider can digest in 10 minutes.
The whole "persistent connection" capability is something that I am unaware of anyone else successfully using. Chef, Puppet, Salt all use remote agents. Ansible "claims" to be "agent-less" but in fact has a dependency on Python being on remote nodes. And it also has a dependency on OpenSSH persistConnect feature being activated. FullAuto has NONE of these dependencies, but can perform every bit as powerfully and perform the same kind of workloads no matter the size or complexity - and can do it with micro images in any cloud environment, and NO dependency requirements for the remote nodes. If you know of another framework that can boast that same capability - I'm all ears!
One final thing - this project has been ongoing since 2000. My Perl skills are a 1000 times greater than what they were when I started the project. Would I do things differently if I was starting over? Certainly. But I'm not starting over, and if a code construct works, works well, and solves a problem even if it's not a "best practice" way to do it, I'm going to stick with it for the "moment" and refactor it only when it becomes convenient to do so. I have a lot of other things to attend to that have higher priority - unfortunately.