April 13, 2007

HTML parsing

I've been looking for a decent HTML parser for some web analysis things I'm writing using .NET. I spent a couple of hours doing web and usenet searches and found parsers for Java, Python, Perl, etc. etc. but nothing looked extensive enough. But I finally ran into libxml which is the XML library for Gnome but also has an awesome HTML parser. It's in C which is fine for me 'cause I'm gonna write a .NET wrapper but I thought I'd let the world know in case anyone else is having a problem finding a good parser.

1 comment:

Anonymous said...

I posted a nerd comment, and it was lost in cyberspace.

Suffice it to say that you are a kewl-ly, nerd-ly stud-ly.