[shadow-dev] Extending Shadow to download objects embedded in HTML

Arne Diekmann caffeine at parttimegeeks.net
Thu May 24 08:46:40 CDT 2012

Hey Rob and the others,

I want to use shadow/scallion to analyze countermeasures in particular against 
the fingerprinting attack and against traffic analysis attacks in general. More 
details are in my thread "Using Shadow/Scallion to connect to 'real' Servers" 
in the shadow-support list.

Rob convinced me that it is easiest to build upon the filetransfer plugin to 
add parsing of HTML and requesting further objects.

I've now written a program to make such a browser-like request with libcurl. 
It can be examined and played with here: 


Any comments on the code are greatly appreciated. I know it has the following 
shortcomings which I might look at in the future, if classification gets too 

- Javascript is considered to be paresed immediately when in reality it may 
take considerable time blocking downloads in the meantime
- Alot of objects are actually  referred to in the CSS (webfonts, background-
- @import is ignored

The next step for me is to actually  integrate that code into shadow. I have a 
few questions about that. Most of them revolve around the virtual network and 
how it works in detail.

1. Can I just use libcurl or do I have to use epoll like the filetransfer 
plugin currently does?  I use libcurl in the multi-interface which use only a 
single thread. 

2. If there are any calls from libcurl which need to be done in a different 
manner,  is there any way I can intercept them just like it is done with Tor? 
Or is this far more complex then using Epoll? 

3. Should I create a new plugin or extend shd-service-filegetter.c ?

4. I need to extract certain features from the Tor traffic for classification.
E.g. I need to determine the following feature:

- Total trace time
- Total transfered bytes
- Individual packet sizes
- The time when each packet was received
- ...

The best (and supposedly) easiest way would be If I could get them in a 
libpcap-like format (because I need to conduct experiments without using Tor 
and only the fetcher script above). What is the easiest and best way to get 
that data?

I know it's a lot of (possibly very stupid) questions again. Bbut I hope you 
can help me knowing that I will be grateful forever :)

- Arne Diekmann

More information about the shadow-dev mailing list