2013年10月1日星期二

URLConnection requested page individual characters and punctuation Chinese garbled why ?

Get the response into a string with utf-8 decoding , there are some characters and punctuation Chinese garbled why ?
For example: large car
embassy car
consulate car
foreign cars
foreign cars
two ? ? motorized ?? / option>
lightweight motorcycle ?? / option>
embassy motorcycle ?? / option>
Consulate motorcycle ?? / option>
------ Solution -------------------------- ------------------
http://jjzx.lywww.com/index.php?m=Index&a=jdwfcx&useUnicode=true&characterEncoding=utf -8 ;
Your problem seems not here, when you took the output encoding GBK, and read (InputStream) when there is no decoding necessary decode.
------ Solution ------- -------------------------------------
says the landlord wrong , I said the landlord the decoding problems, really yes.
sTotalString + = sCurrentLine + "\ n";
is what you have so one .
this time you actually sTotalString is garbled ( relative to GBK, relative to UTF-8 is normal ) , and you add a newline GBK , until replaced by UTF-8 format , when the wrapping is naturally on the hash.
you use this way to read the following :

/**
 * 以指定的格式来读取输入流
 */
public static String readStrByCode(InputStream is,String code){
StringBuilder builder=new StringBuilder();
BufferedReader reader=null;

try {
 reader = new BufferedReader(new InputStreamReader(is,code));
 String line;
 while((line=reader.readLine())!=null){
 builder.append(line+"\n");
 }
} catch (Exception e) {
e.printStackTrace();
}finally{
try {
reader.close();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
return builder.toString();
}



------ For reference only ---------------------------------- -----
decoding of the way .
------ For reference only -------------------------------------- -
is tomcat do ?
yes, then open sever.xml
found

 <Connector connectionTimeout="20000" port="8080" protocol="HTTP/1.1" redirectPort="8443" />

changed
 <Connector connectionTimeout="20000" port="8080" protocol="HTTP/1.1" redirectPort="8443" URIEncoding="UTF-8"/>

------ For reference only ----------------------------------- ----
is tomcat do ?
yes, then open sever.xml
find
read:

------ For reference only -------------------------------- -------


not tomcat, is to simulate the request , the code below
package com.test;

import java.io.BufferedReader;  
import java.io.IOException;  
import java.io.InputStream;  
import java.io.InputStreamReader;  
import java.io.OutputStreamWriter;  
import java.net.URL;  
import java.net.URLConnection;  
  
public class TestPost {  
  
    public static void testPost() throws IOException {  
  
        /** 
         * 首先要和URL下的URLConnection对话。 URLConnection可以很容易的从URL得到。比如: // Using 
         *  java.net.URL and //java.net.URLConnection 
         */  
        URL url = new URL("http://jjzx.lywww.com/index.php?m=Index&a=jdwfcx");  
        URLConnection connection = url.openConnection();  
          
        /** 
         * 然后把连接设为输出模式。URLConnection通常作为输入来使用,比如下载一个Web页。 
         * 通过把URLConnection设为输出,你可以把数据向你个Web页传送。下面是如何做: 
         */  
        connection.setDoOutput(true);  
        /** 
         * 最后,为了得到OutputStream,简单起见,把它约束在Writer并且放入POST信息中,例如: ... 
         */  
        OutputStreamWriter out = new OutputStreamWriter(connection  
                .getOutputStream(), "GBK");  //8859_1
        out.write("hpzl=02&fzjq=Q&hphm=55555&clsbdh=1111"); //post的关键所在!  
        // remember to clean up  
        out.flush();  
        out.close();  
        
        /** 
         * 这样就可以发送一个看起来象这样的POST:  
         * POST /jobsearch/jobsearch.cgi HTTP 1.0 ACCEPT: 
         * text/plain Content-type: application/x-www-form-urlencoded 
         * Content-length: 99 username=bob password=someword 
         */  
        // 一旦发送成功,用以下方法就可以得到服务器的回应:  
        String sCurrentLine;  
        String sTotalString;  
        sCurrentLine = "";  
        sTotalString = "";  
        InputStream l_urlStream = connection.getInputStream();  
       
        // 传说中的三层包装阿!  
        BufferedReader l_reader = new BufferedReader(new InputStreamReader(  
                l_urlStream));  
        while ((sCurrentLine = l_reader.readLine()) != null) {  
            sTotalString += sCurrentLine +"\n";  
  
        }  
        String encode = connection.getContentEncoding();
        System.out.println("the ContentEncoding is "+encode);
        String contentType = connection.getContentType();
        System.out.println("the contentType is "+contentType);
        System.out.println(new String(sTotalString.getBytes("GBK"),"UTF-8"));  
    }  
  
    public static void main(String[] args) throws IOException {  
  
        testPost();  
  
    }  
  
}  

------ For reference only ----------------------------------- ----
you try to GBK into utf-8; not, then in the url Riga encoding is utf-8
------ For reference only ---------- -----------------------------

how to write code ?
------ For reference only -------------------------------------- -

tried it or not , new String (sTotalString.getBytes ("GBK"), "UTF-8") is not decode it ? You mean is InputStream decode ? How decoding ?
------ For reference only -------------------------------------- -

tested several decoding, only new String (sTotalString.getBytes ("GBK"), "UTF-8") can decode the characters to
----- - For reference only ---------------------------------------

Hello , your way to solve the problem.

 String sTotalString = readStrByCode(l_urlStream,"UTF-8");
        System.out.println(sTotalString);//显示正常

Test

InputStreamReader isr = new InputStreamReader(is);
System.out.println("new InputStreamReader(is).getEncoding() is "+isr.getEncoding());

read here does not specify an encoding , the default used GBK encoding , and the server response page encoding is UTF-8, this has led to a distortion
printout


Tested sTotalString + = sCurrentLine + "\ n"; I put the wrap removed becomes sTotalString + = sCurrentLine; still have individual garbled, garbled place , too, where you can OK garbled with + "\ n" does not matter.
opinionated readLine wrap cause , so look for the line position transformation , look garbled place is not become ,
change the code as follows:

 InputStreamReader isr = new InputStreamReader(is);
 System.out.println("new InputStreamReader(is).getEncoding() is "+isr.getEncoding());
 reader = new BufferedReader(isr);
 char [] cbuf = new char[1024];
 while( reader.read(cbuf)!=-1){
 String str = new String(cbuf);
 sTotalString += str;
 }

still have individual garbled, garbled place too.
My question should be UTF-GBK-UTF problems , looked at the online java gbk turn garbled utf-8 and UTF and I think we should have relations of inter- GBK

------ For reference only ---------------------------------- -----

Hello , your way to solve the problem.   
  

 String sTotalString = readStrByCode(l_urlStream,"UTF-8");
        System.out.println(sTotalString);//显示正常
  
Test   
  

InputStreamReader isr = new InputStreamReader(is);
System.out.println("new InputStreamReader(is).getEncoding() is "+isr.getEncoding());
  
read here does not specify an encoding , the default used GBK encoding , and the server response page encoding is UTF-8, this has led to a distortion   
printout   
     
  
Tested sTotalString + = sCurrentLine + "\ n"; I put the wrap removed becomes sTotalString + = sCurrentLine; still have individual garbled, garbled place , too, where you can OK garbled with + "\ n" does not matter.   
opinionated readLine wrap cause , so look for the line position transformation , look garbled place is not become ,   
change the code as follows:   
  

 InputStreamReader isr = new InputStreamReader(is);
 System.out.println("new InputStreamReader(is).getEncoding() is "+isr.getEncoding());
 reader = new BufferedReader(isr);
 char [] cbuf = new char[1024];
 while( reader.read(cbuf)!=-1){
 String str = new String(cbuf);
 sTotalString += str;
 }
  
still have individual garbled, garbled place too.   
My question should be UTF-GBK-UTF problems , looked at the online java gbk turn garbled utf-8 and UTF and I think we should have relations of inter- GBK   
 
you read this way of course be wrong. . .
As for the reason . . You look at the
http://bbs.csdn.net/topics/390578551
I have you committed the same mistake. . .

没有评论:

发表评论