Skip Menu |

This queue is for tickets about the POE CPAN distribution.

Report information
The Basics
Id: 45408
Status: resolved
Priority: 0/
Queue: POE

People
Owner: Nobody in particular
Requestors: sean.pieper [...] viable.net
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: POE filehandle leak
Date: Fri, 24 Apr 2009 18:41:18 -0400 (EDT)
To: bug-POE [...] rt.cpan.org
From: sean.pieper [...] viable.net
Dear POE maintainers,

   I've discovered a filehandle leak in POE, intially with v 1.003 and confirmed still present in 1.005. This leak was first spotted in a production server where clients connect and are authorized and logged in a database before further interactions. The database access can take a while when we get a large number of clients connecting simultaneously, and if some clients drop off in the meantime, we quickly leak filehandles (not sockets though-- the network stuff gets shutdown properly).

Effectively, what happens for the leak is:

socketfactory creates a new socket
new socket gets wrapped as a wheel::readwrite
wheel::readwrite gets input and kicks off an event to the session handling DB communication
wheel::readwrite receives close
DB session completes, and wheel::readwrite is collected
filehandle is still kicking around.

I've written what I believe is a minimal demonstration (attached). This bug only appears if I post to a separate session-- if the slow (outstanding) event is in the same session that creates the wheels, there is no leak. If the outstanding event is resolved before the socket closes, the filehandle is also correctly cleaned.

Please let me know if I can do anything further to help resolve this issue. I'm trying to track down a high load deadlock in my server code, and I'd like to eliminate this bug as a potential culprit.

Thanks for a really useful framework.

-sean

Sean Pieper
Engineer
Viable, Inc.
VRS: ViableVRS.tv
www.viable.net
sean.pieper@viable.net
Office: 240-292-0222 x244
VSN: viablepieper


This e-mail, including any attachments may contain information that is protected by law as PRIVILEGED AND CONFIDENTIAL and is intended solely for the use of the recipient or the employee or agent responsible for delivering the message to the recipient. Please note that if you are not the intended recipient, you are hereby notified that any dissemination, copying, distribution, retention, re-transmission, printing or any other use of this e-mail or the information contained herein is strictly prohibited. If you have received this e-mail communication in error, please immediately send an e-mail reply to notify the sender and immediately and permanently delete this e-mail from your computer system. Thank you.
Download leak_demo.tgz
application/x-compressed-tar 2.5k

Message body not shown because it is not plain text.

Subject: RE: [rt.cpan.org #45408] AutoReply: POE filehandle leak
Date: Thu, 30 Apr 2009 14:06:02 -0400 (EDT)
To: bug-POE [...] rt.cpan.org
From: sean.pieper [...] viable.net
Update: I've traced down the problem to an issue with POE Kernel and Resource::FileHandles. 1) the wheel is created in the Server session 2) Resource::FileHandles registers that 'Server' owns the wheel's handle. 3) 'Slow' gets a pointer to the wheel 4) 'Server' detects the close, and removes the wheel from its heap, but the wheel is still in scope because of 'Slow'. 5) 'Slow' finishes its business with the wheel and wheel goes out of scope 6) Wheel DESTROY calls Kernel select_read, which calls _internal_select with the active session ('Slow') 7) _internal_select calls Resource::FileHandles _data_handle_remove 8) _data_handle_remove detects that 'Slow' does not own the wheel's filehandle, and refuses to delete the filehandle. I've verified that if when 'Slow' finishes with the wheel, it passes it back to 'Server' so that the wheel goes out of scope in the owning session, everything cleans up properly. I'm not sure really how to clean this up-- is there a good reason not to search through all the sessions to find the owner if an object goes out of scope in a session that does not own it? or, for that matter, to have the owning session as part of the wheel, and pass that session to _data_handle_remove rather than the active session? Sean Pieper Engineer Viable, Inc. VRS: ViableVRS.tv www.viable.net sean.pieper@viable.net Office: 240-292-0222 x244 VSN: viablepieper This e-mail, including any attachments may contain information that is protected by law as PRIVILEGED AND CONFIDENTIAL and is intended solely for the use of the recipient or the employee or agent responsible for delivering the message to the recipient. Please note that if you are not the intended recipient, you are hereby notified that any dissemination, copying, distribution, retention, re-transmission, printing or any other use of this e-mail or the information contained herein is strictly prohibited. If you have received this e-mail communication in error, please immediately send an e-mail reply to notify the sender and immediately and permanently delete this e-mail from your computer system. Thank you.
Subject: RE: [rt.cpan.org #45408] : workaround
Date: Thu, 30 Apr 2009 17:22:37 -0400 (EDT)
To: bug-POE [...] rt.cpan.org
From: sean.pieper [...] viable.net
in the wheel error handler, or any other place that the wheel might be removed from the heap of the session that owns it, can add explicit calls to shutdown_input and shutdown_output. Then, even if the object is actually reaped later, the filehandle will already have been released by poe's internals. workaround looks like this: sub on_client_error : State { # Handle client error, including disconnect. my ($op, $err, $wheel_id) = @_[ARG0,ARG1,ARG3]; warn("SRV:main got $op $err $wheel_id\n"); $_[HEAP]->{RW}->{$wheel_id}->shutdown_input(); $_[HEAP]->{RW}->{$wheel_id}->shutdown_output(); delete $_[HEAP]->{RW}->{$wheel_id}; } I still think this is a bug though. Sean Pieper Engineer Viable, Inc. VRS: ViableVRS.tv www.viable.net sean.pieper@viable.net Office: 240-292-0222 x244 VSN: viablepieper This e-mail, including any attachments may contain information that is protected by law as PRIVILEGED AND CONFIDENTIAL and is intended solely for the use of the recipient or the employee or agent responsible for delivering the message to the recipient. Please note that if you are not the intended recipient, you are hereby notified that any dissemination, copying, distribution, retention, re-transmission, printing or any other use of this e-mail or the information contained herein is strictly prohibited. If you have received this e-mail communication in error, please immediately send an e-mail reply to notify the sender and immediately and permanently delete this e-mail from your computer system. Thank you.
On Thu Apr 30 14:06:27 2009, sean.pieper@viable.net wrote: Show quoted text
> Update: > > I've traced down the problem to an issue with POE Kernel and > Resource::FileHandles. > > 1) the wheel is created in the Server session > 2) Resource::FileHandles registers that 'Server' owns the wheel's > handle.
The wheel adds an anonymous input event handler to the Server session, using POE::Kernel->state(). The wheel calls POE::Kernel->select_read() to register an input watcher. This input watcher associates the socket with the Server session and the newly created anonymous event handler. Both state() and select_read() operate "in the current session". Their resources (a CODE reference and a socket handle) are implicitly associated with the session that created the wheel. Show quoted text
> 3) 'Slow' gets a pointer to the wheel > 4) 'Server' detects the close, and removes the wheel from its heap, > but the wheel is still in scope because of 'Slow'. > 5) 'Slow' finishes its business with the wheel and wheel goes out of > scope > 6) Wheel DESTROY calls Kernel select_read, which calls > _internal_select with the active session ('Slow') > 7) _internal_select calls Resource::FileHandles _data_handle_remove > 8) _data_handle_remove detects that 'Slow' does not own the wheel's > filehandle, and refuses to delete the filehandle.
Exactly so. In your test case, Slow is trying to destroy a wheel and filehandle watcher owned by Server. This would break Server's encapsulation, which POE normally prevents by design. The general pattern to work around this is to give Server total control over the connection. Server would expose an API, and other sessions would $kernel->post() or $kernel->call() it to do the actual sending. Given that you've run across a design feature, I'm not sure there's anything I can fix.
Subject: RE: [rt.cpan.org #45408] POE filehandle leak
Date: Mon, 1 Jun 2009 12:04:06 -0400 (EDT)
To: bug-POE [...] rt.cpan.org
From: sean.pieper [...] viable.net
Thanks for getting back to me. I feel like this should at least be red flagged in the wheel::readwrite documentation and a nasty warning raised if a session tries to shutdown someone else's wheel. I'm also not too confident in the workaround you suggest. Unless I'm missing something important, I'd still have to specify the wheel ID in my API calls and instead of passing an actual pointer to the wheel object between sessions, I'd pass the ID. The server session would get the ID, look it up in its hash, and operate on the correct wheel. For this approach to be safe, I need to trust that when a wheel goes out of scope, its ID is never reused-- otherwise I can get an even nastier problem: 1. wheel id is passed to task 2. task gets stalled 3. connection is closed, wheel is deleted from hash 4. new wheel gets created with duplicate ID 5. task completes and calls into API with ID 6. ID found in hash but wheel is for a different client 7. data intended for party A goest to party B. Unfortunately, the documentation for wheel->ID states that no such guarantee is provided: "Wheel IDs may be reused, although it has never been reported". Maybe it's paranoid, but this doesn't give me a warm feeling, particularly if POE is changed at some point to more agressively reuse IDs. What I'd prefer to see would be something along the lines of a session variable inside the wheel that the DESTROY method explicitly provides to _data_handle_remove. This would still prevent other sessions from explicitly shutting down a wheel's i/o, but keep garbage collection sane. -sean Show quoted text
-----Original Message----- From: "Rocco Caputo via RT" <bug-POE@rt.cpan.org> Sent: Saturday, May 30, 2009 6:24pm To: sean.pieper@viable.net Subject: [rt.cpan.org #45408] POE filehandle leak <URL: https://rt.cpan.org/Ticket/Display.html?id=45408 > On Thu Apr 30 14:06:27 2009, sean.pieper@viable.net wrote:
> Update: > > I've traced down the problem to an issue with POE Kernel and > Resource::FileHandles. > > 1) the wheel is created in the Server session > 2) Resource::FileHandles registers that 'Server' owns the wheel's > handle.
The wheel adds an anonymous input event handler to the Server session, using POE::Kernel->state(). The wheel calls POE::Kernel->select_read() to register an input watcher. This input watcher associates the socket with the Server session and the newly created anonymous event handler. Both state() and select_read() operate "in the current session". Their resources (a CODE reference and a socket handle) are implicitly associated with the session that created the wheel.
> 3) 'Slow' gets a pointer to the wheel > 4) 'Server' detects the close, and removes the wheel from its heap, > but the wheel is still in scope because of 'Slow'. > 5) 'Slow' finishes its business with the wheel and wheel goes out of > scope > 6) Wheel DESTROY calls Kernel select_read, which calls > _internal_select with the active session ('Slow') > 7) _internal_select calls Resource::FileHandles _data_handle_remove > 8) _data_handle_remove detects that 'Slow' does not own the wheel's > filehandle, and refuses to delete the filehandle.
Exactly so. In your test case, Slow is trying to destroy a wheel and filehandle watcher owned by Server. This would break Server's encapsulation, which POE normally prevents by design. The general pattern to work around this is to give Server total control over the connection. Server would expose an API, and other sessions would $kernel->post() or $kernel->call() it to do the actual sending. Given that you've run across a design feature, I'm not sure there's anything I can fix. Sean Pieper Engineer Viable, Inc. VRS: ViableVRS.tv www.viable.net sean.pieper@viable.net Office: 240-292-0222 x244 VSN: viablepieper This e-mail, including any attachments may contain information that is protected by law as PRIVILEGED AND CONFIDENTIAL and is intended solely for the use of the recipient or the employee or agent responsible for delivering the message to the recipient. Please note that if you are not the intended recipient, you are hereby notified that any dissemination, copying, distribution, retention, re-transmission, printing or any other use of this e-mail or the information contained herein is strictly prohibited. If you have received this e-mail communication in error, please immediately send an e-mail reply to notify the sender and immediately and permanently delete this e-mail from your computer system. Thank you.
Subject: wheels break when manipulated by other sessions
Regarding red flags: I'm revising POE::Wheel's DESCRIPTION. How does this sound: Sessions must not expose their wheels to other sessions. Doing so will likely cause problems because wheels are tightly integrated with the sessions that created them. For example, calling put() on a POE::Wheel::ReadWrite instance may enable a write-okay watcher. The handler for this watcher is already defined in the wheel's owner. Calling put() outside that session will enable the write-okay watcher in the wrong session, and the event will never be handled. Likewise, wheels must be destroyed from within their creator sessions. Otherwise breakage will occur when the wheels' DESTROY methods try to unregister event handlers and watchers from the wrong sessions. To simplify things, it's recommended to store POE::Wheel instances in heaps of the sessions that created them. --- Regarding wheel IDs: Due to the Halting Problem, I can't guarantee that wheel IDs are never reused. I also don't want to guarantee a minimum time between reuse because it'll complicate the code and limit future change. Since a task in your application may involve many wheels, it seems reasonable to decouple task ID from wheel ID. Your task manager should expose task IDs that persist for the lifespan of each task. Internally, it can map task IDs to wheel IDs with much shorter lifespans. Internally, wheel IDs can change throughout a task's lifetime. The task manager will update the wheel ID associated with the task as connections are re-established. Externally, the rest of the application has a single, stable ID for each task. --- Regarding the session variable in the wheel: By allowing one object to remove a watcher from another, we open up semantics where any object can do as it pleases with another's resources. I'd like to explore less permissive options, if they're available. POE has a lot of optional checks, one of which may already catch the problem. They are disabled by default because they incur significant performance penalties. You can try them by setting POE_ASSERT_DEFAULT=1 in your shell environment.
Subject: RE: [rt.cpan.org #45408] wheels break when manipulated by other sessions
Date: Wed, 26 Aug 2009 11:25:31 -0400 (EDT)
To: bug-POE [...] rt.cpan.org
From: sean.pieper [...] viable.net
Hi Rocco, that sounds very clear. I'd been using the assert default option and it did not raise the warning-- I had to turn on a specific flag for warnings about other sessions acting on data that they did not own. It'd be nice if this flag could be included in the defaults. -sean Show quoted text
-----Original Message----- From: "Rocco Caputo via RT" <bug-POE@rt.cpan.org> Sent: Tuesday, August 25, 2009 11:47pm To: sean.pieper@viable.net Subject: [rt.cpan.org #45408] wheels break when manipulated by other sessions <URL: https://rt.cpan.org/Ticket/Display.html?id=45408 > Regarding red flags: I'm revising POE::Wheel's DESCRIPTION. How does this sound: Sessions must not expose their wheels to other sessions. Doing so will likely cause problems because wheels are tightly integrated with the sessions that created them. For example, calling put() on a POE::Wheel::ReadWrite instance may enable a write-okay watcher. The handler for this watcher is already defined in the wheel's owner. Calling put() outside that session will enable the write-okay watcher in the wrong session, and the event will never be handled. Likewise, wheels must be destroyed from within their creator sessions. Otherwise breakage will occur when the wheels' DESTROY methods try to unregister event handlers and watchers from the wrong sessions. To simplify things, it's recommended to store POE::Wheel instances in heaps of the sessions that created them. --- Regarding wheel IDs: Due to the Halting Problem, I can't guarantee that wheel IDs are never reused. I also don't want to guarantee a minimum time between reuse because it'll complicate the code and limit future change. Since a task in your application may involve many wheels, it seems reasonable to decouple task ID from wheel ID. Your task manager should expose task IDs that persist for the lifespan of each task. Internally, it can map task IDs to wheel IDs with much shorter lifespans. Internally, wheel IDs can change throughout a task's lifetime. The task manager will update the wheel ID associated with the task as connections are re-established. Externally, the rest of the application has a single, stable ID for each task. --- Regarding the session variable in the wheel: By allowing one object to remove a watcher from another, we open up semantics where any object can do as it pleases with another's resources. I'd like to explore less permissive options, if they're available. POE has a lot of optional checks, one of which may already catch the problem. They are disabled by default because they incur significant performance penalties. You can try them by setting POE_ASSERT_DEFAULT=1 in your shell environment. Sean Pieper Engineer Viable, Inc. VRS: ViableVRS.tv www.viable.net sean.pieper@viable.net Office: 240-292-0222 x244 VSN: viablepieper This e-mail, including any attachments may contain information that is protected by law as PRIVILEGED AND CONFIDENTIAL and is intended solely for the use of the recipient or the employee or agent responsible for delivering the message to the recipient. Please note that if you are not the intended recipient, you are hereby notified that any dissemination, copying, distribution, retention, re-transmission, printing or any other use of this e-mail or the information contained herein is strictly prohibited. If you have received this e-mail communication in error, please immediately send an e-mail reply to notify the sender and immediately and permanently delete this e-mail from your computer system. Thank you.
I'm not sure if there's anything left to resolve. Please reopen if you feel that something's not satisfactory.