Java lovers: By char convert a String to byte array

This post last edited by the skmbw on 2013-09-22 15:58:36

for example String text = " A is " ;
If passed byte [] bytes = text.getBytes (); byte array is converted
[-80, -94, -54, -57]
If you want to do through the char , which is traversed String of each character , and then
convert byte array. To and text.getBytes () to get an array of the same, how to do it ?
encoding is GBK encoding .
example:


  String text = "阿是";
  byte[] bytes = text.getBytes();//[-80, -94, -54, -57]
  byte[] abytes = new byte[text.length() * 2];
  for (int i = 0; i < text.length(); i++) {
      char c = text.charAt(i);
  	//在这里通过c，该怎么做，才能将String转成byte[]
  
      //System.out.println((byte)((c + 0xA0)) );
  	//System.out.println((byte)(0x96));
  	//System.out.println((byte)(0xA0 + (c >> 6)));
  	//System.out.println((byte)(0xa0 + (c & 0x3F)));		
  }

god please let me know.
------ Solution ---------------------------------------- ----


What is the problem here ?

then answer questions such as landlord .

If you want to know immediately , you can open a paste hang 40 points to me

There is no problem . . . . You are wrong , right ? . . . .

Well, I said is not accurate, so write is actually possible.
------ Solution ---------------------------------------- ----
Java character set conversion has the API , including GBK, research getBytes () to go .
------ Solution ---------------------------------------- ----
do not know why you have to use char to turn , if it is to use it, look at this way it can meet your requirements


  
  package study.string.length;
  
  import java.io.UnsupportedEncodingException;
  
  import sun.io.CharToByteConverter;
  import sun.io.MalformedInputException;
  
  public class StrLenght {
  
      public static void main(String[] args) throws UnsupportedEncodingException, MalformedInputException {
          String str = "a中";
          byte[] chars = str.getBytes();
          for (int x = 0; x < chars.length; x++) {
              System.out.println(chars[x]);
          }
  
          print(str);
      }
  
      public static void print(String str) throws UnsupportedEncodingException, MalformedInputException {
          byte[] result = new byte[str.getBytes().length];
          int p = 0;
          for (int i = 0; i < str.length(); i++) {
              char c = str.charAt(i);
              byte l = (byte) c;
              byte h = (byte) (c >> 8);
  
              if (h == 0) {
                  result[p++] = l;
              } else {
                  char[] cs = new char[1];
                  cs[0] = c;
  
                  CharToByteConverter converter = CharToByteConverter.getConverter("GBK");
                  byte[] br = converter.convertAll(cs);
  
                  result[p++] = br[0];
                  result[p++] = br[1];
              }
  
          }
          for (int x = 0; x < result.length; x++) {
              System.out.println(result[x]);
          }
      }
  
  }

------ Solution ------------------------------------- -------
If I have to deal with a character a character , you can use CharBuffer.
But still the easiest to use String , generally will not go wrong.
------ For reference only -------------------------------------- -
char is double-byte , and the result will be different , oh
------ For reference only ----------------- ----------------------
for (int i = 0; i < text.length () ; i + +) {
char c = text.charAt (i);

If you have Chinese character processing , text.length () would be a problem .
------ For reference only -------------------------------------- -

What is the problem here ?
------ For reference only -------------------------------------- -

What is the problem here ?

then answer questions such as landlord .

If you want to know immediately , you can open a paste hang 40 points to me

------ For reference only ---------------------------------------


What is the problem here ?

then answer questions such as landlord .

If you want to know immediately , you can open a paste hang 40 points to me

There is no problem . . . . You are wrong , right ? . . . .
------ For reference only -------------------------------------- -

deal indeed Chinese , char can represent Chinese . If Chinese is converted into a byte [] b = new byte [2]; letters to convert a byte. What is the problem ? Please enlighten me .
------ For reference only -------------------------------------- -

deal indeed Chinese , char can represent Chinese . If Chinese is converted into a byte [] b = new byte [2]; letters to convert a byte. What is the problem ? Please enlighten me .

there is no problem,
But you can not new byte [2], Byte and Char is a difference .
------ For reference only -------------------------------------- -
I want to use bit operations, so that the performance better. If you use a nio package the way I would do .
If it is converted to UTF-8, the following operations are possible :


  (byte)(0xE0 + (chr >> 12));
  (byte)(0x80 + ((chr >> 6) & 0x3F));
  (byte)(0x80 + (chr & 0x3F));

above code is a character chr .
If a character directly


  (byte)chr;

can.
now confusion is GBK, bitwise how to write .
Please god who educated us.
------ For reference only -------------------------------------- -

deal indeed Chinese , char can represent Chinese . If Chinese is converted into a byte [] b = new byte [2]; letters to convert a byte. What is the problem ? Please enlighten me .

there is no problem,
But you can not new byte [2], Byte and Char is a difference .
There is a difference , so to pass bit computing , the char into byte [].
please let us know .
------ For reference only -------------------------------------- -

ah , that I have seen , and finally calls the native method . I only java ah.
------ For reference only -------------------------------------- -

this is not a problem .

Please refer Ming Road , thanks

------ For reference only ---------------------------------------

this use sun private classes and methods. Also a way .
------ For reference only -------------------------------------- -

this uses the sun 's private classes and methods. Also a way .

yes ah, source code , it is quite complicated, you can look at free look .
------ For reference only ------------------------------------ ---
I made reference to the JDK charset.jar in , sun.nio.cs.ext.GBK18030.java source, as well as in rt.jar sun.nio.cs.UTF_8.java source. utf -bit arithmetic coding can be directly converted into bytes because java internal use unicode encoding , utf-8 encoding is the law , and can be directly mapped into unicode. And gbk and unicode without a certain relationship . Bitwise mostly unable to turn . In GBK18030.java is completely manual way by means of the mapping .
And I will GBK18030.java the code extracted , made tools , performance, not as a direct use nio transfer efficiency.
This is nio way :


  import java.nio.ByteBuffer;
  import java.nio.CharBuffer;
  import java.nio.charset.Charset;
  import java.util.Arrays;
  
  public class GBKCharUtils {
  	public static final Charset charset = Charset.forName("GBK");
  	
  	public static byte[] getBytes(char c) {
  		CharBuffer charBuffer = CharBuffer.allocate(1);
  		charBuffer.put(c);
  		charBuffer.flip();
  		ByteBuffer byteBuffer = charset.encode(charBuffer);
  		return byteBuffer.array();
  	}
  	
  	public static byte[] getBytes(char[] chars) {
  		CharBuffer charBuffer = CharBuffer.wrap(chars);
  		ByteBuffer byteBuffer = charset.encode(charBuffer);
  		return byteBuffer.array();
  	}
  	
  	public static void main(String[] args) {
  		CharBuffer charBuffer = CharBuffer.allocate(3);
  		charBuffer.put('c');
  		charBuffer.put('2');
  		charBuffer.put('a');
  		
  		System.out.println(Arrays.toString(getBytes('雷')));
  		
  		System.out.println(Arrays.toString(getBytes(new char[]{'雷'})));
  	}
  	
  }

So go directly nio it.
Here is GBK18030.java extracted to approach.


  public void encode(CharBuffer src, ByteBuffer dst) {
              //int hiByte = 0, loByte = 0;
              while (src.hasRemaining()) {
                  char c = src.get();
                  if (c >= 0x0000 && c <= 0x007F) {
                      dst.put((byte)c);
                  } else if (c <= 0xA4C6 || c >= 0xE000) {
                      int outByteVal = getGB18030(encoderIndex1, encoderIndex2, c);
                      //hiByte = (outByteVal & 0xFF00) >> 8;
                      //loByte = outByteVal & 0xFF;
  
                      dst.put((byte)((outByteVal & 0xFF00) >> 8));
                      dst.put((byte)(outByteVal & 0xFF));
                  }
              }
          }

which getGB18030 is to do gbk encoding and unicode mapping .
specific source , jdk does not seem to , you can download the source code to see openjdk .

Java lovers

2013年9月23日星期一

By char convert a String to byte array

没有评论:

发表评论