<meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>
so it is utf-8 encoding After the source code
html
<li>
Sergio Agüero
</li>
For example, I want to extract such a string Sergio Agüero,
String info = score.select("li").toString()
( where score is a layer of li )
I can only get Sergio Ag ü ero, garbled. . Is the character ü could not be read . .
checked some of the other methods mentioned under the forum , such as
String info = new String(score.select("li").toString().getBytes("UTF-8"), "UTF-8");
but you can still get Sergio Ag ü ero
also seen
InputStreamReader insReader = new InputStreamReader(new FileInputStream(f), "UTF-8");
BufferedReader bufReader = new BufferedReader(insReader);
String line = new String();
while ((line = bufReader.readLine()) != null){
System.out.println(line);
}
this way, but because I want to read html relatively large, to extract something more, so I think that is not very realistic to read line by line . . .
beginner html, less likely to other methods. . How would like to ask everyone html utf-8 encoding can be extracted out of this string Sergio Agüero?
------ Solution ---------------------------------------- ----
this is the character entity references that he converted into a corresponding character on the line, such as replaceAll ("ü", "ü").
http://zh.wikipedia.org/wiki/XML% E4% B8% 8EHTML% E5% AD% 97% E7% AC% A6% E5% AE% 9E% E4% BD% 93% E5% BC% 95% E7% 94% A8% E5% 88% 97% E8% A1% A8
This is the XML and HTML character entity references list
------ Solution ------------------------- -------------------
I only know this solution.
This is not garbled, but said the way the site using a special character , Chinese will not use this stuff .
Also want to learn more about it, can look on XML character references relevant knowledge .
------ For reference only ---------------- -----------------------
ah original! ! Thank you ~ really feasible
can then ask the next case? Because I want to give all the players active sorted out . . Looked at this list , at least there are still dozens of look. .
but . . Currently I have not encountered Chinese situation, but to see some of the other places mentioned Chinese will garbled away. . What is this ? If so, the Chinese can not find a list like this , and how should I do it?
------ For reference only -------------------------------------- -
Thank you ~
没有评论:
发表评论