The first step to directly read pdf content , content to get the following format :
% PDF-1.4
% ] throw
8 0 obj
<< / Length 150/Filter/FlateDecode >> stream
x moat Zhong ?? E perch Wu v abuse "?? ?% C Io ?
? $ h wheel 7 @ Jian ? [ Li enjoin feeding psW sulfonamide 5 Lennon ? remorse T Ru Yun ? x Jing !.?? q?? t bad ? [ Xiu C Lou o {c -Ming ? Lu D? Lun k caught ? Qie D Boma c cut off the left ear of the slain nGwl tumultuous k? Zhe 3 | poultry ? Yan [ Ji ? Benten mix water with clay Norway ? Ou 4 ?
endstream
endobj
can see the stream and endstream content between garbled , which is a string of compressed data , the question now is how these bunch decompress data ? I tried java.util.zip classes to decompress , but always reported exception:
java.util.zip.ZipException: incorrect header check
like the first step needed for the stream to be processed endstream content before decompression, but still do not know how to operate . There is no similar experiences to support a move ?
------ Solution ---------------------------------------- ----
concern that the UTF can not engage in out of it tried without success Hey
------ Solution ---- ----------------------------------------
read binary mode must be used when read
------ Solution -------------------------------------- ------
read out something that is garbled in UTF-8 try, people you do not know to write code read out is certainly come to be encoded garbled parse read
------ For reference only ---------------------------------- -----
Where master ah ! UP!
------ For reference only -------------------------------------- -
have experience or have friends who are interested to study under ah
------ For reference only ---------------------- -----------------
this must be , in fact, the situation is between the stream and endstream decompress the data , if it is in English no problem, but Chinese is garbled. Coding problem is too complicated.
------ For reference only -------------------------------------- -
pdf document inside the Chinese ? Is the text to display it, depending on the font information into unicode not on
没有评论:
发表评论