JAVA中的字符编码操作

作者：网络转载发布时间：[ 2013/12/24 16:42:05 ] 推荐标签：JAVA 字符编码

　　如何得到系统的默认编码：
　　System.out.println(System.getProperty("file.encoding"));
　　str以unicode编码可以转到兼容的其它编码

printBytes(str.getBytes("utf-8")); // -28 -72 -83 -26 -106 -121

printBytes(str.getBytes("unicode")); // -2 -1 78 45 101 -121

printBytes(str.getBytes("gb2312")); // -42 -48 -50 -60

　　不能转到iso8859-1，因为iso8859-1不能编码中文，输出63，63
　　printBytes(str.getBytes("iso8859-1")); // 63 63
　　通过bytes指定正确的编码可以还原到string

byte[] bytes = {-28， -72， -83， -26， -106， -121};

System.out.println(getStringFromBytes(bytes，"utf-8"));

byte[] bytes1 = { -2，-1，78， 45， 101， -121};

System.out.println(getStringFromBytes(bytes1，"unicode"));

System.out.println(new String(bytes1));//

　　bytes1是unicode的"中文"，系统的默认编码是utf-8，会将unicode的bytes当做utf8来解释，还原的string是烂码
　　来看一下文本文件的字节流
　　我们有一个utf8编码的文件，内容为“中文”，我们通过hex的方式查看文件，内容如下

　　readBytesFromFile("C:/D/charset/utf8.txt");读到的为下面的bytes，-17 -69 -65 -28 -72 -83 -26 -106 -121，其中-28 -72 -83 -26 -106 -12是"中文"的utf-8编码bytes，-17 -69 -65 是utf8编码文件的文件的文件头头形式的bytes，e4=256-28=228（getBytes得到的是-28，和e4的二进制是一样的）
　　我们有一个gb2312编码的文件，内容为“中文”，我们通过hex的方式查看文件，内容如下