show title " Party Secretary Xia lighting " and
content , beg
------ Solution ------------------------------------ then almost --------
content. . Through the label . Or attributes match . Then regular .
------ For reference only -------------------------------------- -
http://blog.csdn.net/jdgdf566/article/details/17039693
------ For reference only ----------------- ----------------------
htmlparser.jar. wrote a title . Almost. Regular matches
String path = "http://zj.jsds.gov.cn/art/2011/12/31/art_39115_408734.html";
Parser parser = new Parser(path);
parser.setEncoding("gbk");
NodeFilter filter = new NodeClassFilter(Div.class);
NodeList nodeList = parser.parse(filter);
String s=nodeList.elementAt(0).getChildren().toHtml();
System.out.println(s.subSequence("<!--<$[标题]>begin-->".length(), s.length()-"<!--<$[标题]>end-->".length())) ;
------ For reference only ----------------------------------- ----
http://blog.csdn.net/withiter/article/ details/14450003
------ For reference only ------------------------------- --------
this can output it
------ For reference only ------------------ ---------------------
search "js get web content ."
refer to:
http://wenwen.soso.com/z/q356947703.htm
http://bbs.csdn.net/topics/240067166
----- - For reference only ---------------------------------------
you can point you in detail . . The best code. . After I quoted this package , copy the code online . . Most error
------ For reference only ------------------------------------ ---
public static void main(String[] args) throws Exception {
String path = "http://zj.jsds.gov.cn/art/2011/12/31/art_39115_408734.html";
Parser parser = new Parser(path);
parser.setEncoding("gbk");
NodeFilter filter1 = new NodeClassFilter(Div.class);
NodeList nodeList1 = parser.parse(filter1);
for (int i = 0; i < nodeList1.size()-1; i++) {
System.out.println(nodeList1.elementAt(i).getChildren().asString());
}
}
=====================================
党组书记、局长 夏照明
发布时间:2011年12月31日
信息来源:市局人事处
党组书记、局长:夏照明
负责主持全面工作。
夏照明,男,汉族,1958年3月出生,江苏泰兴人,1980年7月参加工作,1985年9月加入中国共产党,研究生学历。
历任扬州市地方税务局党组成员、副局长,扬州市地方税务局党组书记、副局长,1998年3月任扬州市地方税务局党组书记、局长,2008年6月任江苏省镇江地方税务局党组书记、局长。
------ For reference only ----------------------------- ----------
jar package Lao Leba I use , I have to solve their own problem , thanks . What do you look on this page : http://zj.jsds.gov.cn/col/col39115/index.html
which data is js inside , grab the leadership required data , but also the key to grab the appropriate link ! beg . . After solving the Open paste +100 points
------ For reference only ------------------------------ ---------
String path = "http://zj.jsds.gov.cn/col/col39115/index.html";
Parser parser = new Parser(path);
parser.setEncoding("gbk");
NodeList list = parser.parse(null);
Matcher m = Pattern.compile("\\['(.*?)'\\]").matcher(list.toHtml());
while(m.find()){
System.out.println( m.group());
}
==========
['<tr><td width=16 align="center"><img src=\'/picture/0/110607114955809.jpg\' align=\'absmiddle\' border=\'0\'></script></span></td><td height=28 align="left"><a style=\'font-size:14px;\' href=\'/art/2011/12/31/art_39115_408734.html\' target="_blank" class=\'bt_link\' title=\'党组书记、局长 夏照明\'>党组书记、局长 夏照明</a></td><td width="80" class=\'bt_date\'><font style=\'color:#313131;\'>2011-12-31</font></td></tr>','<tr><td width=16 align="center"><img src=\'/picture/0/110607114955809.jpg\' align=\'absmiddle\' border=\'0\'></td><td height=28 align="left"><a style=\'font-size:14px;\' href=\'/art/2011/12/31/art_39115_408733.html\' target="_blank" class=\'bt_link\' title=\'党组副书记、副局长 施竞平\'>党组副书记、副局长 施竞平</a></td><td width="80" class=\'bt_date\'><font style=\'color:#313131;\'>2011-12-31</font></td></tr>','<tr><td width=16 align="center"><img src=\'/picture/0/110607114955809.jpg\' align=\'absmiddle\' border=\'0\'></td><td height=28 align="left"><a style=\'font-size:14px;\' href=\'/art/2011/12/31/art_39115_408732.html\' target="_blank" class=\'bt_link\' title=\'党组成员、副局长 李 峻\'>党组成员、副局长 李 峻</a></td><td width="80" class=\'bt_date\'><font style=\'color:#313131;\'>2011-12-31</font></td></tr>','<tr><td width=16 align="center"><img src=\'/picture/0/110607114955809.jpg\' align=\'absmiddle\' border=\'0\'></td><td height=28 align="left"><a style=\'font-size:14px;\' href=\'/art/2011/12/31/art_39115_408731.html\' target="_blank" class=\'bt_link\' title=\'党组成员、纪检组长 邵云\'>党组成员、纪检组长 邵云</a></td><td width="80" class=\'bt_date\'><font style=\'color:#313131;\'>2011-12-31</font></td></tr>','<tr><td width=16 align="center"><img src=\'/picture/0/110607114955809.jpg\' align=\'absmiddle\' border=\'0\'></td><td height=28 align="left"><a style=\'font-size:14px;\' href=\'/art/2011/12/31/art_39115_408730.html\' target="_blank" class=\'bt_link\' title=\'党组成员、副局长 郦梅生\'>党组成员、副局长 郦梅生</a></td><td width="80" class=\'bt_date\'><font style=\'color:#313131;\'>2011-12-31</font></td></tr>','<tr><td width=16 align="center"><img src=\'/picture/0/110607114955809.jpg\' align=\'absmiddle\' border=\'0\'></td><td height=28 align="left"><a style=\'font-size:14px;\' href=\'/art/2011/5/20/art_39115_408729.html\' target="_blank" class=\'bt_link\' title=\'党组成员、总经济师 高凌\'>党组成员、总经济师 高凌</a></td><td width="80" class=\'bt_date\'><font style=\'color:#313131;\'>2011-05-20</font></td></tr>']
------ For reference only ----------------------- ----------------
jar package can give you, I find several of them. Thank you. 357788906@qq.com
------ For reference only ---------------------------------- -----
not issued a thank you , to find the
------ For reference only ------------- --------------------------
Because from this point into a page , so to give a re- href assignment , System.out.println (m.group (). replace ("href = \ '", "href = \'? url = zj.jsds.gov. cn "));, how did it play a role
------ For reference only ------------------------ ---------------
what you want . Is to extract the data inside , and what is the relationship between the assignment ? You do not grab the data it?
String path = "http://zj.jsds.gov.cn/col/col39115/index.html";
Parser parser = new Parser(path);
parser.setEncoding("gbk");
NodeList list = parser.parse(null);
Matcher m = Pattern.compile("\\['(.*?)'\\]").matcher(list.toHtml());
String str="";
while(m.find()){
System.out.println(m.group());
str= m.group();
}
Matcher m1=Pattern.compile("href=\\\\'(.*?\\.html)(.*?>)(.*?)(</a>)").matcher(str);
while(m1.find()){
System.out.println(m1.group(1));
System.out.println(m1.group(3));
}
------ For reference only ----------------------------------- ----
has been resolved. My goal is to level the page is then linked to crawl get two pages , thank you heroes
------ For reference only ------ ---------------------------------
This is what does that mean , how some web pages based on key characters , the content does not lose : http://www.jsds.gov.cn/art/2013/5/24/art_50308_585582.html < br> ------ For reference only ---------------------------------------
What do you mean , you want to get http://www.jsds.gov.cn/art/2013/5/24/art_50308_585582.html the inside?
directly grab ah. The key character ? What does this mean ?
没有评论:
发表评论