歡迎您光臨本站 註冊首頁

java爬取網頁源代碼,解析

←手機掃碼閱讀     火星人 @ 2014-03-09 , reply:0

  1.搜索詞的地址採用模擬地址方法(通過分析搜索引擎的參數得到,如百度),然後將搜索詞加到模擬的地址中.

  2.函數的輸入參數是模擬地址.

  String query = URLEncoder.encode("潘柱廷", "UTF-8");

  String url=".baidu./s?wd=" query "&pn=" p*10 "&tn=baiduhome_pg&ie=utf-8"

  public void MakeQuery(String domain) {

  try {

  HttpClient httpClient = new HttpClient();

  GetMethod getMethod = new GetMethod(domain);

  //System.out.println("*************************************************************");

  //System.out.println(getMethod);

  try{

  httpClient.executeMethod(getMethod);

  }catch(Exception e){

  System.out.println("網路問題");

  }

  getMethod.getParams().setParameter(HttpMethodParams.RETRY_HANDLER,

  new DefaultHttpMethodRetryHandler());

  int statusCode = httpClient.executeMethod(getMethod);

  if (statusCode != HttpStatus.SC_OK) {

  System.err.println("Method failed: "

   getMethod.getStatusLine());

  }

  byte[] responseBody = getMethod.getResponseBody();

  //System.out.println("*************************************************************");

  //System.out.println(responseBody);

  String response = new String(responseBody, "UTF-8");

  //System.out.println("*************************************************************");

  //System.out.println(response);

  //Jsoup解析html

  Document doc=Jsoup.parse(response);

  //System.out.println("*************************************************************");

  //System.out.println(doc);

  Elements contents=doc.getElementsByClass("f");

  for(Element content:contents){

  Element links = content.getElementsByTag("a").first();

  String linkHref = links.attr("href");//鏈接

  String linkText = links.text();//摘要

  FoursearchZH.map.put(linkHref, linkText);

  System.out.println("------------------");

  System.out.println(linkHref);

  System.out.println(linkText);

  }

  } catch (Exception e) {

  System.err.println("Something went wrong…");

  e.printStackTrace();

  }

  }


[火星人 ] java爬取網頁源代碼,解析已經有430次圍觀

http://coctec.com/docs/java/show-post-59795.html