Parsing an HTML/XML document
From a String
let html = "<html><body><h1>Tutorials</h1></body></html>"
if let htmlDoc = HTML(html: html, encoding: .utf8) {
}
let xml = "<root><item><name>Tutorials</name></item></root>"
if let xmlDoc = XML(xml: xml, encoding: .utf8) {
}
The variables htmlDoc
and xmlDoc
are Kanna documents, which have interesting properties and methods.
From a File
let data = try! Data(contentsOf: URL(fileURLWithPath: filePath))
if let doc = HTML(html: data, encoding: .utf8) {
}
or
let html = try! String(contentsOfFile: path, encoding: .utf8)
if let doc = HTML(html: html, encoding: .utf8) {
}
From the Internets
let url = URL(string: "https://en.wikipedia.org/wiki/Cat")
if let doc = HTML(url: url!, encoding: .utf8) {
}
Of course, You can use other networking libraries. (e.g. Alamofire)
Alamofire.request("https://en.wikipedia.org/wiki/Cat").responseString(queue: nil, encoding: .utf8) { response in
if let html = response.result.value,
let doc = HTML(html: html, encoding: .utf8) {
for headline in doc.css(".mw-headline") {
print(headline.text)
}
}
}
Encoding
If you want Kanna to handle the document encoding properly, Your best bet is to explicitly set the encoding. Hrea is an example of explicitly setting the encoding to EUC-JP on the parser:
let html = "<html><body><h1>Tutorials</h1></body></html>"
if let htmlDoc = HTML(html: html, encoding: .japaneseEUC) {
}
Parse Options
Kanna offers quite a few options that affect how a document is parsed.
Note: This is optional argument.
let html = "<html><body><h1>Tutorials</h1></body></html>"
if let doc = HTML(html: html, encoding: .utf8, option: .htmlParseUseLibxml([.STRICT])) {
}