cfp registration * program schedule presenters * lists about sponsors home

CodeCon 2006

 

 

 

 

 

Program


Daylight Fraud Prevention - Anti-Phishing prevention, tracking and detection through real-time web-based forensics.
presentersLance James
history Daylight Fraud-Prevention* (DFP) is a suite of technologies offering a powerful proactive defense against scammers and online criminals, protecting your company's name, and assisting in exposing the individuals behind these fraudulent acts. Current anti-fraud solutions are limited, and take a significant amount of time and manpower. DFP applies security in depth and addresses the core attributes and common attack methods used by phishers and online criminals. Each of the DFP technologies are independent, and address a combination of detection, identification, prevention, and tracking methodologies. All technology is installed as apache server modules, and can be applied as an inline or offline appliance depending on needs.
demo The demonstration will consist of an online accessible target site (fictitious bank) that is a target of a phisher. We will demonstrate the 4 attack techniques phishers use, and show how the technology can detect phishing attacks weeks before they may occur. We will also demonstrate techniques to prevent phishing and malware attacks without adding another piece of software on someone's desktop using innovative yet simple techniques based on web-analytics conducted by Secure Science. Our tracking techniques will demonstrate how to get around the "hiding behind the proxy" problem as well as utilizing real-time dynamic steganographic techniques to protect your content and use by phishers. The demo will consist of phishing sites set up, malware attacks in a sandbox being thwarted, as well as the target bank in defense.
future plans To gain more adoption of this technology in the main stream as a practical defense-in-depth solution against phishing and malware attacks.

delta - Delta assists you in minimizing "interesting" files subject to a test of their interestingness.
presentersDaniel S. Wilkerson, Scott McPeak
history Scott and I were working on various static-analysis projects (our research group: http://osq.cs.berkeley.edu/). We had large inputs that would cause our tools to fail and minimizing by hand was hopeless, so I we wrote delta. With delta, no matter how big you start with, you always end up with about a page or two of code, even a quarter-million line input we tried once.

Microsoft Research heard about it through the grapevine and asked someone to come to my office and ask me if I would release it as open source, so I did. Maybe they didn't want to write an email to me endorsing an open source project on record? I don't know. Now this thing is everywhere: it is taught in the Stanford and Berkeley software engineering classes and the gcc people use it. See the website.

Our implementation is based on the Delta Debugging algorithm: http://www.st.cs.uni-sb.de/dd/

demo I will probably minimize a file while wearing no clothes. Just kidding; I wouldn't actually minimize a file in public.
future plans The gcc people seem to like it and one of them has checkin privileges so I suppose it will keep getting better. I have an idea to generalize the algorithm. Most people have ideas on how to make it work better for *their* use of it, but these schemes tend to make it worse in general.

Deme - A free/open-source platform for online group deliberation and dialogue
presentersTodd Davies, Benjamin Newman, Brendan O'Connor, Aaron Tam
history Work on Deme began in 2003 as an outgrowth of the partnership between the East Palo Alto (California) Community Network and Stanford's Symbolic Systems Program. The goal was to develop a web-based environment for group/organization tasks that are usually thought to require face-to-face meetings. The effectiveness and legitimacy of grassroots groups, community-based planning, and nonprofit boards in East Palo Alto were being undermined by barriers to achieving the face time and attendance that decisions require. Email lists were in use, and technology initiatives were making Internet access available to most residents. But free web tools lacked features for item-centered discussion, collaborative editing, group choice, project planning, and committees in an asynchronous, email-integrated package. Early versions of Deme (v.0.1-0.5) focused on functionality, slighting user-friendliness. New AJAX code, scheduled for merging in January, includes an overhaul of the UI and synchronous chat.
demo We will present an overview of Deme, from its motivations and goals to the most recent codebase.
* Todd Davies will describe the problem-driven approach of Deme and its origins in East Palo Alto and in contemporary civil society more generally. Deme is embedded in the new field of online deliberation, whose practitioners have grand ambitions for remaking the world through well-designed social software. Our approach focuses on making civil society groups more participative. Deme's approach to the stock (artifact) versus flow (discussion) problem of groupware will be described.
* Brendan O'Connor will walk through use-cases in Deme, explaining the group/meeting area distinction, and using Deme for discussion, document creation and revision, annotation of items and websites, decision making, and planning. Email integration will also be demonstrated, and implications for scalability discussed.
* Aaron Tam will describe the chat and administrative features of Deme, addressing maintenance of separate lists and settings for Deme's meeting areas and the groups within which they are linked. The configurable stylesheet in Deme's new version will be demonstrated.
* Benjamin Newman will discuss the AJAX code underlying Deme's new meeting area viewer, including an approach to overcoming the hidden fields limitations in JavaScript's inheritance scheme.
future plans * Member-controlled open hosting of Deme and other applications
* Loose authentication coupling mods for other apps (Drupal/CivicSpace, Wordpress, etc.)
* Voice integration
* WYSIWYG wiki-style and synchronous editing of documents
* Tagging/recovering a document's version with its annotations
* Richer set of decision procedures
* Grassroots and developer outreach
* Deliberation experiments

Dido - A platform for writing dynamic voice menu systems, in Perl
presenters Quinn Weaver
history Dido grew out of a frustration with the open-source telephony platform Asterisk's dialplan system.

In the summer of 2005, I had just founded Fairpath, and I was looking for a quick demo to get into expensive VoIP industry conferences. I quickly realized that what I wanted to do--reordering menu options in voice menu, by popularity--was impossible in Asterisk. In fact, the problem was deeper than I thought. The halting problem was implicated.

I ended up creating Dido, a radically new system that makes use of declarative XML templates, interspersed with Perl code that generates more XML. The result is a programming model that mimics the way dynamic Web pages are written. I hope this homology will help more programmers begin writing IVRs.

In the meantime, my "quick demo" turned out to be innovative enough to make the main focus of Fairpath. I offer custom and turnkey IVRs using Dido.

demo The audience will be able to call into the demo system during the talk, using their cell phones, and traverse it independently.
future plans Dido needs a better reordering algorithm. This involves some problems in graph theory (as any menu system is essentially a graph, possibly with cycles) and statistics (as Dido needs to deduce which options are most popular based on n users' call patterns).

I plan to use Dido as part of a future GUI system that lets naive users build IVRs.


Djinni - Approximating Solutions to Nigh-Unsolvable Problems--Fast!
presentersRobert J. Hansen, Tristan D. Thiede
history In 2003, two researchers at the University of Iowa (Drs. Jeff Ohlmann and Barrett Thomas) came up with a new approximation algorithm for NP-complete problems. However, they were unable to come up with an efficient implementation of their algorithm. They recruited Robert Hansen to hack on it for them, and ultimately version 1.0 of Djinni was created.

Djinni is an extensible, heavily documented framework for the efficient approximation of problems generally thought to be unsolvable. It doesn't give you the optimal solution, but generally gives you very close to it, and in a very reasonable time frame.

demo Various computationally hard tasks will be presented (at present I'm looking into Tetris, Lemmings, scheduling algorithms, graph-theoretic algorithms, etc.), along with how Djinni can be applied to each of them.
future plans Djinni is a testbed for new algorithms. As such, it's quite likely that new approximation algorithms will be easily adapted to the Djinni framework. Approximation algorithms are an active part of operations research, and future developments are expected soon.

Elsa/Oink/Cqual++ - A static-time whole-program dataflow analysis for C and C++
presentersDaniel S. Wilkerson
history The Cqual++ backend is based on Cqual which was Jeff Foster's (http://www.cs.umd.edu/~jfoster/) Ph.D. research under Alex Aiken (http://theory.stanford.edu/~aiken/): http://www.cs.umd.edu/~jfoster/cqual/ ; Many others at Cal Berkeley worked on it.

The backend was reimplemented by Rob Johnson to do a polymorphic dataflow analysis as part of his thesis (still unpublished): http://www.cs.sunysb.edu/~rtjohnso/

Elsa is a C/C++ front end designed, maintained, and mostly implemented by Scott McPeak with some help from Daniel Wilkerson: http://www.cs.berkeley.edu/~smcpeak/elkhound/sources/elsa/

Daniel Wilkerson hooked Elsa and Cqual together to make Cqual++. It resides in a kind of super-project called Oink which is designed to allow multiple backends for Elsa to cooperate (the only example of which presently is Cqual++). For example, the dataflow analysis is pretty generic and other dataflow-based C++ analyses could be written using it and added to Oink.

demo The major thing you can do with a cqual-style dataflow analysis is you can find bad dataflow bugs, such as this program which has a dangerous format-string bug (do you see it?)

    char $tainted *getenv(char const *name);
    int printf(char const $untainted *fmt, ...);
    int main(void) {
      char *s, *t;
      s = getenv("LD_LIBRARY_PATH");
      t = s;
      printf(t);
    }
Cqual++ finds it and explains the dangerous dataflow path to you.
    ./cc_qual -config ../qual/config/lattice taint.cc
    taint.cc:2 warning: vararg function does not have polymorphic type
    taint.cc:2 WARNING: fmt'' treated as $tainted and $untainted

    fmt'': $tainted $untainted
    taint.cc:5      $tainted <=  ( (getenv) ( ("LD_LIBRARY_PATH")))'
    taint.cc:5               <=  (s)''
    taint.cc:6               <=  (t)''
    taint.cc:7               <= fmt''
    taint.cc:2               <= $untainted

I will demonstrate a correct dataflow analysis of a horrible page of C++. Its a good thing this venue is 21+. Could also demonstrate the Elsa front-end reading in C++, parsing, disambiguating, type-checking, lowering (instantiating templates, making implicit syntax explicit, etc.) and rendering out the lowered abstract syntax tree and type system annotations as XML.
future plans
  • Elsa continues to be maintained and become closer to a full C++ front-end.
  • Cqual++ will be used to find bugs in lots of popular open-source programs. We will attempt to make it a standard tool in the developer world so as to raise the level of software quality (the mission of the OSQ group).
  • Oink will perhaps become home to more analyses of C and C++ which can hopefully co-operate with one another. Or it may be simply merged into Elsa.

iGlance - Open source push-to-talk videoconferencing and screen-sharing
presentersDavid Barrett
history iGlance was conceived and built on the road, designed specifically to overcome the technical and social hurdles that typically face remote workers. The entire project is based on the metaphor of recreating the natural habits we all use when physically present, but remotely over the internet: a peek over a cubicle wall is replaced with a glance at your buddy list; muttering to your co-worker is achieved via push-to-talk; pointing at your screen is window sharing. Thus iGlance eschews the typical boolean approach to realtime collaboration features (uses all or none of your whole screen/CPU/network). Rather it weaves a progressive fabric of incremental interaction, starting with online/offline presence, adding push-to-talk VoIP, "always on" video buddy lists, whole-screen or per-window desktop sharing, and ultimately realtime video calls. iGlance was constructed entirely by David over the course 1400 hours while wandering the southern Caribbean and west-coast of the United States.
demo Though open source and non-commercial, iGlance is not vapor: it exists and is in use today in growing numbers. David will demonstrate all the abovementioned features, including push-to-talk VoIP (with customizable, per-buddy hotkeys), always-on video buddy lists, screen and window sharing, file transfers, and more. David will highlight the unique differences of iGlance, including collaborative screen sharing (versus the 'all your screen are belong to us' approach of VNC or Remote Desktop), passive videoconferencing (more like sitting at the same table than making a phone call), reality-mimicking privacy options (like 'you can only see me if I can see you back'), its totally unique approach to user interface, and so forth. Finally, David will provide a concise overview of the system's P2P network topology, current development status and priorities, application architecture, and pointers for how to download and get oriented in the code.
future plans Though iGlance works and provides value today, it's still very early -- only released for general public use within the past 2 months. There are countless improvements to be made to audio quality, CPU usage, platform portability (nearly all platform-dependent code is isolated to a single file, but only a Win32 port is available), and so on. But at the end of the day the thing works, and thus my emphasis for the moment is more on growing the userbase itself, as well as easing the transition of new developers into the project through responsiveness and improved documentation.

Localhost - A popularity-based P2P file sharing system based on BitTorrent
presentersAaron Harwood, Thomas Jacobs
history The project was originally thought of as an online BitTorrent-based "file system", but can more accurately be described as a hierarchical directory structure to index the files on the BitTorrent network in a decentralised way. The project can be seen as a decentralised replacement for BitTorrent torrent indexing websites. After the original idea was formulated, we considered a number of possible designs. The design we settled on was placing semantics on certain files in the network so that they are interpreted as directory nodes by our program. The system allows the index to be modified by users. We decided to implement the program as an augmentation of Azureus. After releasing the program, the project website was linked to by a number of news websites, and the program was downloaded over 10,000 times. This suggests that people are interested in the system, however it is unclear at this stage whether it will become widely accepted in the community.
demo We plan to demonstrate browsing through the directory structure, and adding new folders and files into the index. This is done through the program's web-interface. We will also show the files and folders that users have added to the index already. Slides, as attached, will be presented that explain how the system currently works.
future plans We are currently working on modifying the system to display web pages, so it becomes a "distributed web" system. This allows users to publish web pages that link to other web pages in the system, and have users who view webpages help serve the webpages. The hierarchical structure resembles a file system, so future work could be done to enable access to the index through a virtual "drive" on the user's computer. Also, the system currently is a bit slow to use, so we plan to speed it up.

Monotone - Low stress, high functionality version control.
presentersNathaniel Smith
history There has been a recent explosion in the world of version control systems (VCS), with a number of sophisticated new tools suddenly appearing within the last few months, all advancing at furious rates, and almost impossible for anyone not involved in the VCS community to follow along. Monotone has been a central player in this advancement, pioneering a number of techniques, and directly inspiring at least two other systems (git and mercurial).
demo In this talk, I will briefly lay out some of the design landscape that VCSes work within, to give a framework for comparing systems that moves beyond the (often unhelpful) dichotomies of "distributed vs. centralized" and "snapshot vs. changeset based"; and then place monotone within this framework, explaining our choices and why we believe in them. Put another way, I'll try to explain how it is not only possible to build a version control system is on top of a fully decentralized, gossip-protocol-y distribution mechanism, with efficient (asymptotically better than rsync) and idempotent synchronization, and no central trust authority, but why those are actually the best tools for the job.
future plans Forthcoming.

OASIS (Overlay Anycast Service InfraStructure) - Anycast for Any Service
presentersMichael J. Freedman
history Many Internet services are distributed across a collection of servers all handling client requests, from content distribution networks, to web mirrors, to file-sharing systems. While the performance and cost of such systems often depends highly on clients' selection of servers, the most common techniques for replica selection produce suboptimal results.

We will present OASIS, a locality-aware server-selection infrastructure. At a high level, OASIS allows a service to register a list of servers, and then, for any client IP address, answers the question, ``Which server should the client contact?'' Server selection is primarily optimized for network locality, but also incorporates factors like liveness and, optionally, load. Currently, we have implemented a DNS redirector that performs server selection upon hostname lookups, thus supporting a wide range of unmodified legacy client applications.

OASIS was developed as a general replacement for the DNS system currently employed by CoralCDN, a publicly-deployed content distribution network. Over the past 18 months, CoralCDN averaged more than 20 million requests per day from millions of unique clients.

demo We will demonstrate a live deployment of OASIS on PlanetLab (a distributed testbed of hundreds of servers), show the resolution of ``Where am I?'' queries from clients and other system state through visualization on Google Maps, and, finally, describe how third-party services can flexibly and transparently use OASIS for their distributed Internet services.
future plans We are in the process of transitioning CoralCDN (www.coralcdn.org), i3 (i3.cs.berkeley.edu), and OpenDHT (www.opendht.org) onto OASIS---all long-running services that have been publicly deployed on PlanetLab for more than a year. Other systems are welcome to join in the fun! In running such systems on OASIS,, we hope to demonstrate its efficacy as a general, practical anycast infrastructure.

Query By Example - Data mining operations within PostgreSQL
presentersMeredith L. Patterson
history QBE grew out of my qualifying exams project. In that, I developed a system which extracts tuples from a database table, has a user rank them in (partial) order of preference, then uses a support vector machine to learn a linear ranking function, which is translated into an ORDER BY clause and applied to the entire relation. I quickly realised that this could be sped up, expanded to nonlinear functions, and made much more user-friendly if the data mining procedures were pushed inside the database engine and invoked through SQL. I proposed the project to Google for the Summer of Code, they accepted it, and now PostgreSQL can support WHERE clauses which find rows LIKE a set of rows you specify but NOT LIKE rows you don't want.

QBE is therefore the first database/data-mining project to support nonlinear classification and ranking operations tightly coupled to the database backend (as opposed to, e.g., Microsoft's Data Mining Extensions, which sit on top of SQL Server, only support decision trees for classification, and don't support ranking at all).

demo I will show how QBE can be used to rapidly develop web services which enable users to quickly find things they're interested in based on a small initial set of preferences. Possible application domains include house/car-hunting, music selection, and online dating.
future plans Extend the techniques used for WHERE clauses (classification) and ORDER BY clauses (ranking) to GROUP BY clauses (clustering) and other common data mining operations. Add an EXCLUDING clause in order to leave out unwanted fields.

The Reusable Unknown Malware Analysis Net (Truman) - An open-source behavioral malware analysis sandnet
presenters Joe Stewart
history Truman grew out of a need for quick behavioral analysis of malware in our Threat Intelligence Group. Because malware is increasingly able to detect the presence of virtual machines, a solution was needed to allow as broad a range of malware to run as possible, while allowing quick re-use with minimal interaction. Malware is also increasingly reliant on Internet access to complete an infection, but allowing even restricted Internet access to malware can be dangerous. Therefore the concept was developed of creating a virtual Internet (which we have termed a "sandnet") for the malware to run in.
demo The demonstration will show a post-mortem analysis tool used after completion of a Truman run. The tool will reconstruct process memory and dump executables from a Win32 physical memory dump, allowing plaintext strings analysis or disassembly of packed executables.
future plans Add anti-detection measures via kernel modules (introduction of random network latency, etc)

Rhizome - An application stack enabling the rapid development of collaborative, Semantic-Web enabled applications.
presenters Adam Souzis
history Rhizome is a open source project written in Python which consists of a stack of components:

At the top is Rhizome Wiki, a wiki-like content management system that let users create structured data with explicit semantics in the same way you create pages in a wiki.

Rhizome Wiki runs on top of Raccoon, a simple application server that uses an RDF model as its data store. Raccoon presents a uniform and purely semantic environment for applications. This enables the creation of applications that are easily migrated and distributed and that are resistant to change.

Raccoon uses two novel technologies to getting RDF in and out of the system: First, "shredding", which extracts RDF out of content such as HTML, wiki text, and various microcontent formats. Second, RxPath, a deterministic mapping between RDF's abstract syntax and the XPath data model which lets developers treat RDF as regular XML and allows them to use standard XML technologies such as XPath, XSLT, and Schematron to query, transform, present and validate RDF data.

demo The demo will demonstrate how to use Rhizome to rapidly and ad-hocly add structured content in a Wiki-like fashion. We'll do this by creating a web site for documenting a development project, including using content extracted from source files.
future plans I'm very close to finishing a real query engine for RxPath, at which point Rhizome should be usable for real web applications. After that I plan to work on adding application-level functionality to take advantage of the RDF store, such as the ability to add Technorati tags and trackbacks to any arbitrary resource. My ideas for exploring the decentralized aspects of the technology such as integrating a P2P lookup primitive to the data access layer still seems a long ways off, but I am investigating a simpler approach of providing an API for adding RDF tuples to the OpenDHT service running on PlanetLab.

SiteAdvisor - Protecting Internet users from spyware, adware, spam, viruses, browser-based attacks, phishing, and online scams by using proprietary crawlers to test every site, download and form on the Web.
presenters Tom Pinckney
history The inspiration to develop SiteAdvisor's software was parents and friends constantly needing our help to reduce spam, remove adware, and avoid fraudulent sites on the Web.

We realized we couldn't always judge the safety of Web sites on our own, either. So we built an army of robot testers which click around the net looking for Web forms, downloads, exploits, pop-ups, etc. We automatically download, install and test every program in a fresh virtual machine. We submit unique e-mail addresses on forms so we can track any resulting spam. We run kernel hooks that look for new processes or executables that may indicate an exploit. A workflow system routes the test to a human operator if the bots detect an error.

The code is a mixture of Java, Python, JavaScript, Perl and C++. We use VMWare's Player application to run the virtual machines, MySQL as the data store and Apache/Tomcat/mod_python to serve it all up. And it runs on Dell servers running Centos 4.2.

demo * We'll describe some of the methodology and coding techniques used to build SiteAdvisor's proprietary crawling and testing technology.
* We'll discuss some of the challenges and proprietary solutions developed to address the daunting challenge of testing everything on the Web.
* We'll show a simulation of how our automated bots test sites.
* We'll illustrate how our test results are applied while searching and browsing, using audience-provided suggestions of sites or search terms.
future plans Some of the additional dimensions to our data depth and new features which are in our development pipeline include: Expanding our crawling and exploit detection capability, integrating "restricted search" to keep your parents and friends from accidentally downloading something dangerous, integrating one-time e-mail addresses with form detection, automatically recognizing "clones" of bad sites based on fingerprints of the content.

VidTorrent/Peers - A scalable real-time p2p streaming protocol.
presentersDimitris Vyzovitis, Ilia Mirkin
history VidTorrent and Peers are ongoing research projects at the MIT Media Laboratory. The high level design of the VidTorrent algorithms started in February 2005. The first working implementation of the Peers kernel was written in March-April. The first simulated version of VidTorrent was written in May, and the first functional prototype was available in July. Peers was released under the GNU GPL (with anon-cvs access) in July. The first public release of VidTorrent, under the GNU GPL, was availabe in December 8.
demo 1) A live demo will be conducted and be available for use during the presentation. The live demo will allow conference attendees to tap into an existing VidTorrent stream, while the presenters will use three different laptops to demonstrate the protocol.

2) A live interactive simulation will be performed as part of the presentation to demonstrate the inner workings of the protocol under simulated failure conditions.

future plans + DHT for Rewire, our rendezvous and connectivity infrastructure.
+ VidTorrent tracker-less with Rewire.
+ VidTorrent self-balancing algorithms.
+ Unboostify the Peers python module and auto-generated code.
+ Large-scale pilot testing.