可以使用Python自帶的HTMLParser模組解析HTML文件:HTMLParser的核心模組是org.htmlparser.Parser類,這個類實際完成了對於HTML頁面的分析工作。這個類有下面幾個建構函式:public Parser ();public Parser (Lexer lexer, ParserFeedback fb);public Parser (URLConnection connection, ParserFeedback fb) throws ParserException;public Parser (String resource, ParserFeedback feedback) throws ParserException;public Parser (String resource) throws ParserException;public Parser (Lexer lexer);public Parser (URLConnection connection) throws ParserException;和一個靜態類public static Parser createParser (String html, String charset);
可以使用Python自帶的HTMLParser模組解析HTML文件:HTMLParser的核心模組是org.htmlparser.Parser類,這個類實際完成了對於HTML頁面的分析工作。這個類有下面幾個建構函式:public Parser ();public Parser (Lexer lexer, ParserFeedback fb);public Parser (URLConnection connection, ParserFeedback fb) throws ParserException;public Parser (String resource, ParserFeedback feedback) throws ParserException;public Parser (String resource) throws ParserException;public Parser (Lexer lexer);public Parser (URLConnection connection) throws ParserException;和一個靜態類public static Parser createParser (String html, String charset);