XML HTML Info ExampleThe XML HTML Info example provides a simple command line utility that scans the current directory for HTML files and prints statistics about them to standard out. The files are parsed using a QXmlStreamReader object. If the file does not contain a well-formed XML document, a description of the error is printed to the standard error console. Basic OperationThe main function of the example uses QDir to access files in the current directory that match either "*.htm" or "*.html". For each file found, the parseHtmlFile() function is called. Reading XML is handled by an instance of the QXmlStreamReader class, which operates on the input file object: QXmlStreamReader reader(&file); The work of parsing and the XML and extracting statistics is done in a while loop, and is driven by input from the reader: int paragraphCount = 0; QStringList links; QString title; while (!reader.atEnd()) { reader.readNext(); if (reader.isStartElement()) { if (reader.name() == "title") title = reader.readElementText(); else if(reader.name() == "a") links.append(reader.attributes().value("href").toString()); else if(reader.name() == "p") ++paragraphCount; } } If more input is available, the next token from the input file is read and parsed. The program then looks for the specific element types, "title", "a", and "p", and stores information about them. When there is no more input, the loop terminates. If an error occurred, information is written to the standard out file via a stream, and the example exits: if (reader.hasError()) { out << " The HTML file isn't well-formed: " << reader.errorString() << endl << endl << endl; return; } If no error occurred, the example prints some statistics from the data gathered in the loop, and then exits. Files: |