使用HtmlAgilityPack解析dl
这是我尝试使用ASP.Net(C#)中的Html Agility Pack解析的示例HTML。
我想要的价值观是:
(我已经在这里采用了第一个条目的示例,但我想要列表中所有条目的这些元素的值)
这是我目前使用的代码,
var webGet = new HtmlWeb(); var document = webGet.Load(url2); var parsedValues= from info in document.DocumentNode.SelectNodes("//div[@class='content-div']") from content in info.SelectNodes("dl//dd") from link in info.SelectNodes("dl//dt/b/a") .Where(x => x.Attributes.Contains("href")) select new { Text = content.InnerText, Url = link.Attributes["href"].Value, AnchorText = link.InnerText, }; GridView1.DataSource = parsedValues; GridView1.DataBind();
问题是我正确获取了链接和锚文本的值,但是对于它的内部文本,它只取第一个条目的值,并为元素出现的总次数填充所有其他条目的相同值。然后它从第二个开始。 在我的解释中,我可能不太清楚,所以这是我用这段代码得到的示例输出:
First Entry https://stackoverflow.com/questions/8942595/parsing-dl-with-htmlagilitypack/1.html 1 First Entry https://stackoverflow.com/questions/8942595/parsing-dl-with-htmlagilitypack/2.html 2 First Entry https://stackoverflow.com/questions/8942595/parsing-dl-with-htmlagilitypack/3.html 3 Second Entry https://stackoverflow.com/questions/8942595/parsing-dl-with-htmlagilitypack/1.html 1 Second Entry https://stackoverflow.com/questions/8942595/parsing-dl-with-htmlagilitypack/2.html 2 Second Entry https://stackoverflow.com/questions/8942595/parsing-dl-with-htmlagilitypack/3.html 3 Third Entry https://stackoverflow.com/questions/8942595/parsing-dl-with-htmlagilitypack/1.html 1 Third Entry https://stackoverflow.com/questions/8942595/parsing-dl-with-htmlagilitypack/2.html 2 Third Entry https://stackoverflow.com/questions/8942595/parsing-dl-with-htmlagilitypack/3.html 3
而我想要得到
First Entry https://stackoverflow.com/questions/8942595/parsing-dl-with-htmlagilitypack/1.html 1 Second Entry https://stackoverflow.com/questions/8942595/parsing-dl-with-htmlagilitypack/2.html 2 Third Entry https://stackoverflow.com/questions/8942595/parsing-dl-with-htmlagilitypack/3.html 3
我对HAP很新,对xpath知之甚少,所以我确信我在这里做错了,但即使花了好几个小时,我仍然无法工作。 任何帮助将非常感激。
解决方案1
我已经定义了一个函数,给定一个dt
节点将返回它之后的下一个dd
节点:
private static HtmlNode GetNextDDSibling(HtmlNode dtElement) { var currentNode = dtElement; while (currentNode != null) { currentNode = currentNode.NextSibling; if(currentNode.NodeType == HtmlNodeType.Element && currentNode.Name =="dd") return currentNode; } return null; }
现在LINQ代码可以转换为:
var parsedValues = from info in document.DocumentNode.SelectNodes("//div[@class='content-div']") from dtElement in info.SelectNodes("dl/dt") let link = dtElement.SelectSingleNode("b/a[@href]") let ddElement = GetNextDDSibling(dtElement) where link != null && ddElement != null select new { Text = ddElement.InnerHtml, Url = link.GetAttributeValue("href", ""), AnchorText = link.InnerText };
解决方案2
没有其他function:
var infoNode = document.DocumentNode.SelectSingleNode("//div[@class='content-div']"); var dts = infoNode.SelectNodes("dl/dt"); var dds = infoNode.SelectNodes("dl/dd"); var parsedValues = dts.Zip(dds, (dt, dd) => new { Text = dd.InnerHtml, Url = dt.SelectSingleNode("b/a[@href]").GetAttributeValue("href", ""), AnchorText = dt.SelectSingleNode("b/a[@href]").InnerText });
例如,你如何使用Html Agility Pack
解析一些元素
上述就是C#学习教程:使用HtmlAgilityPack解析dl分享的全部内容,如果对大家有所用处且需要了解更多关于C#学习教程,希望大家多多关注—计算机技术网(www.ctvol.com)!
public string ParseHtml() { string output = null; HtmlDocument htmldocument = new HtmlDocument(); htmldocument.LoadHtml(YourHTML); HtmlNode node = htmldocument.DocumentNode; HtmlNodeCollection dds = node.SelectNodes("//dd"); //Select all dd tags HtmlNodeCollection anchors = node.SelectNodes("//b/a[@href]"); //Select all 'a' tags that contais href attribute for (int i = 0; i < dds.Count; i++) { string atributteValue = null. Text = dds[i].InnerText; Url = anchors[i].GetAttributeValue("href", atributteValue); AnchorText = anchors[i].InnerText; //Your code... } return output; }
本文来自网络收集,不代表计算机技术网立场,如涉及侵权请联系管理员删除。
ctvol管理联系方式QQ:251552304
本文章地址:https://www.ctvol.com/cdevelopment/1008864.html