2014年5月18日星期日

Parsing a string sent java regex

 Posts by s13219091 edited on 2014-05-16 09:33:01
Hello , I use HttpURLConnection visit a site , trying to parse the data returned under this positive not know how to write in java . . , JS has been achieved

This data is returned :

<td><ahref="javascript:detatilssc('2014051596');">2014051596</a></td><td>02:00</td><tdclass="red"><p>33440</p></td></tr><tr><td><ahref="javascript:detatilssc('2014051595');">2014051595</a></td><td>01:50</td><tdclass="red"><p>75119</p></td></tr><tr><td><ahref="javascript:detatilssc('2014051594');">2014051594</a></td><td>01:40</td><tdclass="red"><p>23254</p></td></tr><tr><td><ahref="javascript:detatilssc('2014051593');">2014051593</a></td><td>01:30</td><tdclass="red"><p>21459</p></td></tr><tr><td><ahref="javascript:detatilssc('2014051592');">2014051592</a></td><td>01:20</td><tdclass="red"><p>50531</p></td></tr><tr><td><ahref="javascript:detatilssc('2014051591');">2014051591</a></td><td>01:10</td><tdclass="red"><p>30877</p></td></tr><tr><td><ahref="javascript:detatilssc('2014051590');">2014051590</a></td><td>01:00</td><tdclass="red"><p>88752</p></td></tr><tr><td><ahref="javascript:detatilssc('2014051589');">2014051589</a></td><td>00:50</td><tdclass="red"><p>82482</p></td></tr><tr><td><ahref="javascript:detatilssc('2014051588');">2014051588</a></td><td>00:40</td><tdclass="red"><p>28531</p></td></tr><tr>


I just want to get inside each td three ( write a regular match all match , then the cycle of each of these three numbers inside )

 获得这三处:  2014051596   02:00  33440 (每个TD都有这三处)


attach JS code :

var reg=/(\d{10}).+(\d{2}\:\d{2}).+<p>([\d ]{9})<\/p>/,
match=str.match(reg);

 match[1], match[2], match[3]   是他们匹配到的值



best to write a DEMO, thank you Daniel
------ Solution ------------------------- -------------------
String s="<td><ahref=\"javascript:detatilssc('2014051596');\">2014051596</a></td><td>02:00</td><tdclass=\"red\"><p>33440</p></td></tr><tr><td><ahref=\"javascript:detatilssc('2014051595');\">2014051595</a></td><td>01:50</td><tdclass=\"red\"><p>75119</p></td></tr><tr><td><ahref=\"javascript:detatilssc('2014051594');\">2014051594</a></td><td>01:40</td><tdclass=\"red\"><p>23254</p></td></tr><tr><td><ahref=\"javascript:detatilssc('2014051593');\">2014051593</a></td><td>01:30</td><tdclass=\"red\"><p>21459</p></td></tr><tr><td><ahref=\"javascript:detatilssc('2014051592');\">2014051592</a></td><td>01:20</td><tdclass=\"red\"><p>50531</p></td></tr><tr><td><ahref=\"javascript:detatilssc('2014051591');\">2014051591</a></td><td>01:10</td><tdclass=\"red\"><p>30877</p></td></tr><tr><td><ahref=\"javascript:detatilssc('2014051590');\">2014051590</a></td><td>01:00</td><tdclass=\"red\"><p>88752</p></td></tr><tr><td><ahref=\"javascript:detatilssc('2014051589');\">2014051589</a></td><td>00:50</td><tdclass=\"red\"><p>82482</p></td></tr><tr><td><ahref=\"javascript:detatilssc('2014051588');\">2014051588</a></td><td>00:40</td><tdclass=\"red\"><p>28531</p></td></tr><tr>";
 Matcher m= Pattern.compile("(\\d{10}).*?(\\d{2}\\:\\d{2}).*?<p>(\\d{5})</p>").matcher(s);
 while(m.find()){
 System.out.println(m.group(1)+"-->"+m.group(2)+"-->"+m.group(3)+"-->");
 
 }

------ Solution --------------- -----------------------------

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

public class CaipiaoTest {
public static void main(String[] args) {
String html = "<table><tr><td><a href=\"javascript:detatilssc('2014051596');\">2014051596</a></td><td>02:00</td><td class=\"red\"><p>33440</p></td></tr><tr><td><a href=\"javascript:detatilssc('2014051595');\">2014051595</a></td><td>01:50</td><td class=\"red\"><p>75119</p></td></tr><tr><td><ahref=\"javascript:detatilssc('2014051594');\">2014051594</a></td><td>01:40</td><td class=\"red\"><p>23254</p></td></tr><tr><td><a href=\"javascript:detatilssc('2014051593');\">2014051593</a></td><td>01:30</td><td class=\"red\"><p>21459</p></td></tr></table>";
Document doc = Jsoup.parse(html);// 解析HTML字符串返回一个Document实现
Elements tds = doc.getElementsByTag("td");// 查找td元素
System.out.println(tds.size());
for(Element e : tds){
System.out.println(e.text());
}
}
}


------ For reference only --- ------------------------------------
for html parsing with jsoup.
------ For reference only -------------------------------------- -
it also requires regular right , jsoup itself can be parsed

------ For reference only ---------------------------------- -----
12
2014051596
02:00
33440
2014051595
01:50
75119
2014051594
01:40
23254
2014051593
01:30
21459

------ For reference only ---------------------------------- -----
Thank you withheld much , I used regular written out, but this is an idea you ah , thank you !
------ For reference only -------------------------------------- -

---- - For reference only ---------------------------------------


Thanks dudes !
I also wrote a similar with yours. .

Pattern pattern = Pattern.compile("(\\d{10})</a></td><td>([0-9]{2}:[0-9]{2})</td><tdclass=\"red\"><p>([0-9]{5})</p>");
Matcher matcher = pattern.matcher(data);
while(matcher.find()){
           System.out.println(matcher.group(1)+"---"+matcher.group(2)+"---"+matcher.group(3));
}

------ For reference only ----------------------------------- ----
I think the regular matching convenient little bar , after you get the TB, TD had to get those three values ​​to find the corresponding element

没有评论:

发表评论