使用HtmlAgilityPack解析dl

这是我尝试使用ASP.Net（C＃）中的Html Agility Pack解析的示例HTML。

   1 
  First Entry
  2 
  Second Entry
  3 
  Third Entry

我想要的价值观是：

（我已经在这里采用了第一个条目的示例，但我想要列表中所有条目的这些元素的值）

这是我目前使用的代码，

 var webGet = new HtmlWeb(); var document = webGet.Load(url2); var parsedValues= from info in document.DocumentNode.SelectNodes("//div[@class='content-div']") from content in info.SelectNodes("dl//dd") from link in info.SelectNodes("dl//dt/b/a") .Where(x => x.Attributes.Contains("href")) select new { Text = content.InnerText, Url = link.Attributes["href"].Value, AnchorText = link.InnerText, }; GridView1.DataSource = parsedValues; GridView1.DataBind();

问题是我正确获取了链接和锚文本的值，但是对于它的内部文本，它只取第一个条目的值，并为元素出现的总次数填充所有其他条目的相同值。然后它从第二个开始。在我的解释中，我可能不太清楚，所以这是我用这段代码得到的示例输出：

 First Entry https://stackoverflow.com/questions/8942595/parsing-dl-with-htmlagilitypack/1.html 1 First Entry https://stackoverflow.com/questions/8942595/parsing-dl-with-htmlagilitypack/2.html 2 First Entry https://stackoverflow.com/questions/8942595/parsing-dl-with-htmlagilitypack/3.html 3 Second Entry https://stackoverflow.com/questions/8942595/parsing-dl-with-htmlagilitypack/1.html 1 Second Entry https://stackoverflow.com/questions/8942595/parsing-dl-with-htmlagilitypack/2.html 2 Second Entry https://stackoverflow.com/questions/8942595/parsing-dl-with-htmlagilitypack/3.html 3 Third Entry https://stackoverflow.com/questions/8942595/parsing-dl-with-htmlagilitypack/1.html 1 Third Entry https://stackoverflow.com/questions/8942595/parsing-dl-with-htmlagilitypack/2.html 2 Third Entry https://stackoverflow.com/questions/8942595/parsing-dl-with-htmlagilitypack/3.html 3

而我想要得到

 First Entry https://stackoverflow.com/questions/8942595/parsing-dl-with-htmlagilitypack/1.html 1 Second Entry https://stackoverflow.com/questions/8942595/parsing-dl-with-htmlagilitypack/2.html 2 Third Entry https://stackoverflow.com/questions/8942595/parsing-dl-with-htmlagilitypack/3.html 3

我对HAP很新，对xpath知之甚少，所以我确信我在这里做错了，但即使花了好几个小时，我仍然无法工作。任何帮助将非常感激。

解决方案1

我已经定义了一个函数，给定一个dt节点将返回它之后的下一个dd节点：

 private static HtmlNode GetNextDDSibling(HtmlNode dtElement) { var currentNode = dtElement; while (currentNode != null) { currentNode = currentNode.NextSibling; if(currentNode.NodeType == HtmlNodeType.Element && currentNode.Name =="dd") return currentNode; } return null; }

现在LINQ代码可以转换为：

 var parsedValues = from info in document.DocumentNode.SelectNodes("//div[@class='content-div']") from dtElement in info.SelectNodes("dl/dt") let link = dtElement.SelectSingleNode("b/a[@href]") let ddElement = GetNextDDSibling(dtElement) where link != null && ddElement != null select new { Text = ddElement.InnerHtml, Url = link.GetAttributeValue("href", ""), AnchorText = link.InnerText };

解决方案2

没有其他function：

 var infoNode = document.DocumentNode.SelectSingleNode("//div[@class='content-div']"); var dts = infoNode.SelectNodes("dl/dt"); var dds = infoNode.SelectNodes("dl/dd"); var parsedValues = dts.Zip(dds, (dt, dd) => new { Text = dd.InnerHtml, Url = dt.SelectSingleNode("b/a[@href]").GetAttributeValue("href", ""), AnchorText = dt.SelectSingleNode("b/a[@href]").InnerText });

例如，你如何使用Html Agility Pack解析一些元素

上述就是C#学习教程：使用HtmlAgilityPack解析dl分享的全部内容，如果对大家有所用处且需要了解更多关于C#学习教程，希望大家多多关注—计算机技术网(www.ctvol.com)!

 public string ParseHtml() { string output = null; HtmlDocument htmldocument = new HtmlDocument(); htmldocument.LoadHtml(YourHTML); HtmlNode node = htmldocument.DocumentNode; HtmlNodeCollection dds = node.SelectNodes("//dd"); //Select all dd tags HtmlNodeCollection anchors = node.SelectNodes("//b/a[@href]"); //Select all 'a' tags that contais href attribute for (int i = 0; i < dds.Count; i++) { string atributteValue = null. Text = dds[i].InnerText; Url = anchors[i].GetAttributeValue("href", atributteValue); AnchorText = anchors[i].InnerText; //Your code... } return output; }

本文来自网络收集，不代表计算机技术网立场，如涉及侵权请联系管理员删除。

ctvol管理联系方式QQ:251552304

本文章地址：https://www.ctvol.com/cdevelopment/1008864.html

Csharp/C#教程：使用HtmlAgilityPack解析dl分享

使用HtmlAgilityPack解析dl

精彩推荐