add a "censor" command to filter out sensitive information from log files #562

New Issue

zooko · 2008-12-23T14:36:50Z

zooko commented

2008-12-23 14:36:50 +00:00

per [//pipermail/tahoe-dev/2008-December/000946.html tahoe-dev/2008-December/000946.html] it would be good to omit the introducer furl from the log file.

This is part of a cluster of tickets including: #562, #563, #685, #1008, #1904, and #1989.

per [//pipermail/tahoe-dev/2008-December/000946.html tahoe-dev/2008-December/000946.html] it would be good to omit the introducer furl from the log file. This is part of a cluster of tickets including: #562, #563, #685, #1008, #1904, and #1989.

zooko added the

labels 2008-12-23 14:36:50 +00:00

zooko added this to the undecided milestone 2008-12-23 14:36:50 +00:00

davidsarah commented

2009-11-01 02:04:45 +00:00

If you like this bug, you might also like #823.

davidsarah commented

2009-12-20 23:41:01 +00:00

If you like this bug, you might also like #860.

tahoe-lafs modified the milestone from undecided to 1.7.0

2010-02-01 19:51:37 +00:00

kevan commented

2010-02-23 01:25:02 +00:00

First, note that the log file that inspired this ticket is here: [//pipermail/tahoe-dev/attachments/20081222/20cc919e/attachment-0001.html]

The tahoe-lafs code itself, unless I'm missing something, doesn't ever print the introducer_furl to a log. I notice that there's one exception in there with a censored furl; perhaps that's an artifact from how things were then, or something that foolscap is doing? I'll look into that more thoroughly later.

I do notice that the storage server furls are also censored in the motivating log file. I don't mind having them there in my log files, and, as Zooko points out in that thread, censoring too much makes the log files less useful. Maybe this can be a configuration switch -- if paranoid logging is turned on, then IP addresses, storage server furls, storage indices/verify caps are censored somehow, and if not they aren't.

First, note that the log file that inspired this ticket is here: [//pipermail/tahoe-dev/attachments/20081222/20cc919e/attachment-0001.html] The tahoe-lafs code itself, unless I'm missing something, doesn't ever print the introducer_furl to a log. I notice that there's one exception in there with a censored furl; perhaps that's an artifact from how things were then, or something that foolscap is doing? I'll look into that more thoroughly later. I do notice that the storage server furls are also censored in the motivating log file. I don't mind having them there in my log files, and, as Zooko points out in that thread, censoring too much makes the log files less useful. Maybe this can be a configuration switch -- if paranoid logging is turned on, then IP addresses, storage server furls, storage indices/verify caps are censored somehow, and if not they aren't.

kevan commented

2010-02-23 03:43:22 +00:00

..alternatively, maybe there's a way that we could add a tool to censor logs after they've been created.

For example, you can do

flogtool filter --after=5 logs/from-2010-02-21-124158--to-present.flog filtered.flog

to post-process logs that way. So maybe you could, if you wanted a censored log snippet to post to tahoe-dev or on the Trac, do something like

flogtool censor logs/from-2010-02-21-124158--to-present.flog censored.log

and have flogtool (or whatever) obfuscate the SIs, furls, and so on. Of course, it's probably much harder to do it that way.

Censorship in a running node is relatively easy, as you can easily determine what is what as it is being logged, and censor accordingly. Censorship after the fact is much harder, because you need to be able to reliably determine whether a certain string is a furl, a storage index, an IP address, something else that should be censored, or nothing at all. It seems to be closer to what I as a user would want, though; if I want to have a useful, low-effort log to attach to a bug report, I shouldn't have to run my node such that it never produces logs with information that might help me later, nor should I have to stop, reconfigure, and restart my node, then hope that the problem reappears.

..alternatively, maybe there's a way that we could add a tool to censor logs after they've been created. For example, you can do ``` flogtool filter --after=5 logs/from-2010-02-21-124158--to-present.flog filtered.flog ``` to post-process logs that way. So maybe you could, if you wanted a censored log snippet to post to tahoe-dev or on the Trac, do something like ``` flogtool censor logs/from-2010-02-21-124158--to-present.flog censored.log ``` and have flogtool (or whatever) obfuscate the SIs, furls, and so on. Of course, it's probably much harder to do it that way. Censorship in a running node is relatively easy, as you can easily determine what is what as it is being logged, and censor accordingly. Censorship after the fact is much harder, because you need to be able to reliably determine whether a certain string is a furl, a storage index, an IP address, something else that should be censored, or nothing at all. It seems to be closer to what I as a user would want, though; if I want to have a useful, low-effort log to attach to a bug report, I shouldn't have to run my node such that it never produces logs with information that might help me later, nor should I have to stop, reconfigure, and restart my node, then hope that the problem reappears.

francois commented

2010-04-05 12:23:40 +00:00

Kevan,

I like your idea of creating a new 'flogtool censor' command.

What about tagging potentially sensitive informations at logging time? For example, let's modify this type of log line

 connectTCP to ('127.0.0.1', 55368)

into

 connectTCP to ('<IP>127.0.0.1</IP>', 55368)

It will then by pretty easy to filter out IP addresses, furls, storage indexes and so on.

Kevan, I like your idea of creating a new 'flogtool censor' command. What about tagging potentially sensitive informations at logging time? For example, let's modify this type of log line ``` connectTCP to ('127.0.0.1', 55368) ``` into ``` connectTCP to ('<IP>127.0.0.1</IP>', 55368) ``` It will then by pretty easy to filter out IP addresses, furls, storage indexes and so on.

kevan commented

2010-04-13 22:53:46 +00:00

That would solve the problem.

I haven't had much time to play with the censorer lately, but it's more or less functional now, with that idea. I'm hoping I can have some patches and tests for people to play with by the end of this weekend.

That would solve the problem. I haven't had much time to play with the censorer lately, but it's more or less functional now, with that idea. I'm hoping I can have some patches and tests for people to play with by the end of this weekend.

kevan commented

2010-05-01 23:48:24 +00:00

A correct solution to this will probably need to be implemented in foolscap, since it turns out that a lot of the compromising log entries come from there.

David-Sarah suggested that foolscap could offer callers of its logging system a way to mark certain log messages (or certain parts of certain log messages) as sensitive, so flogtool censor or whatever would know to censor them. For example,

from foolscap.logging import log
[...]
log.msg("some stuff" + log.sensitive("sensitive information")

You'd basically need to do the following to solve this ticket, if you wanted to do it as above:

Decide how to represent sensitive information in foolscap logs, and implement the sensitive function.
Implement flogtool censor.
Go through and audit logging code in foolscap and tahoe-lafs so that it uses sensitive where appropriate.
Make patches for your changes and get them accepted into foolscap and tahoe-lafs.

Between GSoC and school, I'm not going to have time to do all of that before 1.7 is due, so I'm unaccepting this ticket in case someone else wants to finish what I've started. I implemented 2, but as tahoe censor. I'm attaching that, and the tests I wrote for it to this ticket -- maybe they'll be useful somehow to whoever accepts this ticket. If I do get time, I'll re-accept it and continue working on it.

A correct solution to this will probably need to be implemented in foolscap, since it turns out that a lot of the compromising log entries come from there. David-Sarah suggested that foolscap could offer callers of its logging system a way to mark certain log messages (or certain parts of certain log messages) as sensitive, so `flogtool censor` or whatever would know to censor them. For example, ``` from foolscap.logging import log [...] log.msg("some stuff" + log.sensitive("sensitive information") ``` You'd basically need to do the following to solve this ticket, if you wanted to do it as above: 1. Decide how to represent sensitive information in foolscap logs, and implement the sensitive function. 2. Implement flogtool censor. 3. Go through and audit logging code in foolscap and tahoe-lafs so that it uses sensitive where appropriate. 4. Make patches for your changes and get them accepted into foolscap and tahoe-lafs. Between GSoC and school, I'm not going to have time to do all of that before 1.7 is due, so I'm unaccepting this ticket in case someone else wants to finish what I've started. I implemented 2, but as `tahoe censor`. I'm attaching that, and the tests I wrote for it to this ticket -- maybe they'll be useful somehow to whoever accepts this ticket. If I do get time, I'll re-accept it and continue working on it.

kevan commented

2010-05-01 23:49:00 +00:00

Attachment censor.darcspatch.txt (10011 bytes) added

implementation of 'tahoe censor'

**Attachment** censor.darcspatch.txt (10011 bytes) added implementation of 'tahoe censor'

censor.darcspatch.txt

9.8 KiB

kevan commented

2010-05-01 23:49:22 +00:00

Attachment tests.darcspatch.txt (20889 bytes) added

tests for 'tahoe censor'

**Attachment** tests.darcspatch.txt (20889 bytes) added tests for 'tahoe censor'

tests.darcspatch.txt

20 KiB

tahoe-lafs modified the milestone from 1.7.0 to 1.7.1

2010-06-16 04:25:27 +00:00

zooko commented

2010-07-11 17:41:57 +00:00

It sounds like from Kevan's comment:68662 that he would not recommend committing these patches to Tahoe-LAFS trunk. Therefore I'm unsetting "review-needed".

It sounds like from Kevan's [comment:68662](/tahoe-lafs/trac-2024-07-25/issues/562#issuecomment-68662) that he would *not* recommend committing these patches to Tahoe-LAFS trunk. Therefore I'm unsetting "review-needed".

zooko modified the milestone from 1.7.1 to undecided

2010-07-11 17:41:57 +00:00

tahoe-lafs modified the milestone from undecided to soon

2012-02-23 00:35:24 +00:00

zooko modified the milestone from soon to eventually

2013-01-14 06:29:15 +00:00

zooko changed title from ~~censor introducer furl from log files~~ to add a "censor" command to filter out sensitive information from log files

2013-01-14 06:29:15 +00:00

zooko commented

2013-01-14 08:02:57 +00:00

Other potentially sensitive information that shows up in foolscap logs (including incident report files):

storage server furls
the exact sizes of files
the self-chosen nicknames of servers

Other potentially sensitive information that shows up in foolscap logs (including incident report files): * storage server furls * the exact sizes of files * the self-chosen nicknames of servers

zooko commented

2013-01-14 08:18:34 +00:00

This issue is interfering with debugging #1670, because a user has reported an occurrence of #1670, but their incident report files contain information which is sensitive to them, so they don't want their flog files posted to the issue tracker.

Sign in to join this conversation.