상세 컨텐츠

본문 제목

웹 크롤링

C#

by 탑~! 2018. 6. 29. 16:15

본문

HtmlAgilityPack


https://www.nuget.org/packages/HtmlAgilityPack/

using System;
using System.IO;
using System.Net;
using System.Text;
using System.Text.RegularExpressions;
using System.Xml.XPath;
class Program
{
 static void Main(string[] args)
 {
 Uri targetUri = new Uri(“http://www.youtube.com/watch?v=8YkbeycRa2A"); HttpWebRequest webRequest = HttpWebRequest.Create(targetUri) as HttpWebRequest;
 using (HttpWebResponse webResponse = webRequest.GetResponse() as HttpWebResponse)
 using (Stream webResponseStream = webResponse.GetResponseStream())
 {
 HtmlDocument s = new HtmlDocument();
 Encoding targetEncoding = Encoding.UTF8;
s.Load(webResponseStream, targetEncoding, true);
 IXPathNavigable nav = s;
string title = WebUtility.HtmlDecode(nav.CreateNavigator().SelectSingleNode(“/html/head/meta[@property=’og:title’]/@content”).ToString());
 string description = WebUtility.HtmlDecode(nav.CreateNavigator().SelectSingleNode(“/html/head/meta[@property=’og:description’]/@content”).ToString());
 string fullDescription = WebUtility.HtmlDecode(s.GetElementbyId(“eow-description”).InnerHtml);
 fullDescription = Regex.Replace(fullDescription, @”<(br|hr)[^>]>”, Environment.NewLine);
 fullDescription = Regex.Replace(fullDescription, @”<[^>]
>”, String.Empty).Trim();
Console.WriteLine(title);
 Console.WriteLine(description);
 Console.WriteLine(fullDescription);
 }
 }
}



PhantomJS (http://phantomjs.org/)  (http://www.nuget.org/packages/PhantomJS/)


Selenium Web Driver (NuGet 패키지: http://www.nuget.org/packages/Selenium.WebDriver/)

'C#' 카테고리의 다른 글

[.Net 4.5] 2Gb 이상 메모리 사용 하기  (0) 2018.07.03
확장자에 대한 기본프로그램 등록  (0) 2018.06.29
yield  (0) 2018.06.29
FileStream 비동기 입출력  (0) 2018.06.29
대용량 txt 파일 짜르기  (0) 2018.06.29

관련글 더보기