阅读视图
再谈 Python自动生成 pdf 文件
pdf 这个东西,不得不说,真的是受人欢迎,任何时候下载个文件或者报告之类的,都想弄个 pdf 版本。当然,这个东西的好处是不管在哪里看,样式基本都是一样的。
然而,缺点也很明显,没有办法直接生成 pdf 文件,当然,通过各种库可以直接将图片转为 pdf。然而,对于复杂格式或者需要使用模板来创建 pdf 的时候,就变得有些麻烦了。
def converImageToPdf(img_list): pdf = fitz.open() PyMuPDF pdf_document = fitz.open() Creates a new PDF #遍历图片文件夹中的所有图片文件 for img_url in img_list: img_local_file = download_image(img_url, 'confirmd_images') img_path = os.path.join(img_folder, img_file) img = fitz.open(img_local_file) img_rect = img[0].rect Get the rectangle of the first page of the image #Create a new page with the same dimensions as the image pdf_page = pdf_document.new_page(width=img_rect.width, height=img_rect.height) #Insert the image into the new page pdf_page.insert_image(pdf_page.rect, filename=img_local_file) #保存PDF文件 img.close() file_name = random_file_name('pdf') if not os.path.exists('confirmd_receipt'): os.mkdir('confirmd_receipt') pdf_document.save(os.path.join('confirmd_receipt/') + file_name) pdf_document.close()
依赖于fitz
pip install fitz
之前写过基于 oss 的:
那么没有 oss 呢?其实此时最简单的办法就是基于 liboffice 了。
# 安装 LibreOffice(Ubuntu/Debian) sudo apt-get install libreoffice # 验证安装 libreoffice --version
代码:
import subprocess import os def convert_to_pdf(input_docx, output_dir): try: # 创建输出目录(如果不存在) os.makedirs(output_dir, exist_ok=True) # 执行转换命令 cmd = [ 'libreoffice', '--headless', '--convert-to', 'pdf', '--outdir', output_dir, input_docx ] result = subprocess.run(cmd, check=True, capture_output=True, text=True) print(f"转换成功: {input_docx} → {output_dir}") return True except subprocess.CalledProcessError as e: print(f"转换失败: {e.stderr}") return False except Exception as e: print(f"发生错误: {str(e)}") return False # 使用示例 convert_to_pdf( input_docx="/path/to/document.docx", output_dir="/path/to/output" )
不多此时大概率得到的 PDF 文件会是乱码:
这一堆框就很专业,应该是没有字体导致的,安装字体文件:
# Ubuntu/Debian sudo apt-get install fonts-wqy-zenhei fonts-wqy-microhei fonts-noto-cjk # CentOS/RHEL sudo yum install wqy-zenhei-fonts wqy-microhei-fonts google-noto-cjk-fonts # 刷新字体缓存 sudo fc-cache -fv
验证安装:
# 查看已安装的中文字体 fc-list :lang=zh | grep -E "WenQuanYi|Noto" # 预期输出示例(显示已安装字体路径): # /usr/share/fonts/truetype/wqy/wqy-zenhei.ttc: WenQuanYi Zen Hei,文泉驛正黑:style=Regular # /usr/share/fonts/opentype/noto/NotoSansCJK-Regular.ttc: Noto Sans CJK JP,Noto Sans CJK JP Regular:style=Regular
如果依然有问题,那就修改系统配置:
# 检查当前 locale locale # 生成中文环境配置(如未安装) sudo locale-gen zh_CN.UTF-8 # 临时设置环境变量(测试用) export LANG=zh_CN.UTF-8 # 永久设置(修改 /etc/default/locale) sudo nano /etc/default/locale # 添加内容: LANG="zh_CN.UTF-8" LC_ALL="zh_CN.UTF-8"
或者转换的之后指定字体:
def convert_with_font(input_docx, output_dir): cmd = [ 'libreoffice', '--headless', '--env:UserInstallation=file:///tmp/libreoffice-altprofile', # 使用独立配置 '--convert-to', 'pdf:writer_pdf_Export:{"Watermark":{"type":"string","value":" "},' '"SelectPdfVersion":{"type":"long","value":"1"},' '"UseTaggedPDF":{"type":"boolean","value":"true"},' '"ExportBookmarks":{"type":"boolean","value":"true"},' '"EmbedStandardFonts":{"type":"boolean","value":"true"}}', '--outdir', output_dir, input_docx ] subprocess.run(cmd, check=True)
此时多数就能解决问题了:
Centos 7 安装PyMuPDF
接引前文,昨天把代码写好测试 ok 之后,以为就万事大吉了。然而,今天往服务器上部署的时候,直接给整麻了。问题一个接一个,错误一堆接一堆。直接让人破防了。
对于 linux 的发行版,我并没有神马偏见,主要是用过的版本也不多,但是,不得不说那个 centos 是真烂,也不知道为啥那么多人喜欢用这个破系统。
直接 pip 安装,好嘛,这一堆错误:
[root@iZbp12k4fwg2euy5kkr9u7Z ~]# pip install PyMuPDF Looking in indexes: http://mirrors.cloud.aliyuncs.com/pypi/simple/ Collecting PyMuPDF Using cached http://mirrors.cloud.aliyuncs.com/pypi/packages/9f/1d/032d24e0c774e67742395fda163a172c60e4d0f9875785d5199eb2956d5e/PyMuPDF-1.19.6.tar.gz (2.3 MB) Preparing metadata (setup.py) ... done Using legacy 'setup.py install' for PyMuPDF, since package 'wheel' is not installed. Installing collected packages: PyMuPDF Running setup.py install for PyMuPDF ... error ERROR: Command errored out with exit status 1: command: /usr/bin/python3 -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-8aqc9a1k/pymupdf_d5ebd12caf9445ab82d6a5af68229d72/setup.py'"'"'; __file__='"'"'/tmp/pip-install-8aqc9a1k/pymupdf_d5ebd12caf9445ab82d6a5af68229d72/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-7fhkepkr/install-record.txt --single-version-externally-managed --compile --install-headers /usr/local/include/python3.6m/PyMuPDF cwd: /tmp/pip-install-8aqc9a1k/pymupdf_d5ebd12caf9445ab82d6a5af68229d72/ Complete output (20 lines): running install running build running build_py creating build creating build/lib.linux-x86_64-3.6 creating build/lib.linux-x86_64-3.6/fitz copying fitz/__init__.py -> build/lib.linux-x86_64-3.6/fitz copying fitz/fitz.py -> build/lib.linux-x86_64-3.6/fitz copying fitz/utils.py -> build/lib.linux-x86_64-3.6/fitz copying fitz/__main__.py -> build/lib.linux-x86_64-3.6/fitz running build_ext building 'fitz._fitz' extension creating build/temp.linux-x86_64-3.6 creating build/temp.linux-x86_64-3.6/fitz gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -D_GNU_SOURCE -fPIC -fwrapv -fPIC -I/usr/include/mupdf -I/usr/local/include/mupdf -Imupdf/thirdparty/freetype/include -I/usr/include/freetype2 -I/usr/include/python3.6m -c fitz/fitz_wrap.c -o build/temp.linux-x86_64-3.6/fitz/fitz_wrap.o fitz/fitz_wrap.c:2755:18: fatal error: fitz.h: No such file or directory #include <fitz.h> ^ compilation terminated. error: command 'gcc' failed with exit status 1 ---------------------------------------- ERROR: Command errored out with exit status 1: /usr/bin/python3 -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-8aqc9a1k/pymupdf_d5ebd12caf9445ab82d6a5af68229d72/setup.py'"'"'; __file__='"'"'/tmp/pip-install-8aqc9a1k/pymupdf_d5ebd12caf9445ab82d6a5af68229d72/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-7fhkepkr/install-record.txt --single-version-externally-managed --compile --install-headers /usr/local/include/python3.6m/PyMuPDF Check the logs for full command output.
按照提示看来是 gcc 报错了,错误原因是没有头文件,一通搜索:https://blog.csdn.net/u012140499/article/details/112798704 提供了解决思路,下载源码https://casper.mupdf.com/releases/安装。
直接下载最新版编译,又是一堆报错:
source/fitz/util.c: In function ‘fz_new_xhtml_document_from_document’: source/fitz/util.c:866:2: warning: ‘new_doc’ may be used uninitialized in this function [-Wmaybe-uninitialized] return new_doc; ^ CC build/release/source/fitz/warp.o CC build/release/source/fitz/writer.o source/fitz/writer.c: In function ‘fz_new_document_writer_with_buffer’: source/fitz/writer.c:305:2: warning: ‘wri’ may be used uninitialized in this function [-Wmaybe-uninitialized] return wri; ^ CC build/release/source/fitz/xml.o CC build/release/source/fitz/xmltext-device.o CC build/release/source/fitz/zip.o CXX build/release/source/fitz/tessocr.o /bin/sh: g++: command not found make: *** [build/release/source/fitz/tessocr.o] Error 127
提示找不到 g++,嗯,再来解决 g++
yum search "gcc-c++"
就一个结果:
oaded plugins: fastestmirror Loading mirror speeds from cached hostfile * base: mirrors.cloud.aliyuncs.com * extras: mirrors.cloud.aliyuncs.com * updates: mirrors.cloud.aliyuncs.com ======================================================================================================= N/S matched: gcc-c++ ======================================================================================================= gcc-c++.x86_64 : C++ support for GCC Name and summary matches only, use "search all" for everything.
安装 g++:
[root@iZbp12k4fwg2euy5kkr9u7Z mupdf-1.22.0-source]# yum install "gcc-c++.x86_64" -y Loaded plugins: fastestmirror Loading mirror speeds from cached hostfile * base: mirrors.cloud.aliyuncs.com * extras: mirrors.cloud.aliyuncs.com * updates: mirrors.cloud.aliyuncs.com Resolving Dependencies --> Running transaction check ---> Package gcc-c++.x86_64 0:4.8.5-44.el7 will be installed --> Finished Dependency Resolution Dependencies Resolved ==================================================================================================================================================================================================================================== Package Arch Version Repository Size ==================================================================================================================================================================================================================================== Installing: gcc-c++ x86_64 4.8.5-44.el7 base 7.2 M Transaction Summary ==================================================================================================================================================================================================================================== Install 1 Package Total download size: 7.2 M Installed size: 16 M Downloading packages: gcc-c++-4.8.5-44.el7.x86_64.rpm | 7.2 MB 00:00:00 Running transaction check Running transaction test Transaction test succeeded Running transaction Installing : gcc-c++-4.8.5-44.el7.x86_64 1/1 Verifying : gcc-c++-4.8.5-44.el7.x86_64 1/1 Installed: gcc-c++.x86_64 0:4.8.5-44.el7 Complete!
再来一遍:
make HAVE_X11=no HAVE_GLUT=no prefix=/usr/local install
编译安装命令参考这个链接:https://mupdf.readthedocs.io/en/latest/quick-start-guide.html#linux
几百行错误出来了:
thirdparty/harfbuzz/src/graph/../hb-meta.hh:76:41: note: in definition of macro ‘HB_AUTO_RETURN’ #define HB_AUTO_RETURN(E) -> decltype ((E)) { return (E); } ^ In file included from thirdparty/harfbuzz/src/graph/pairpos-graph.hh:32:0, from thirdparty/harfbuzz/src/graph/gsubgpos-graph.hh:31, from thirdparty/harfbuzz/src/graph/gsubgpos-context.cc:27: thirdparty/harfbuzz/src/graph/classdef-graph.hh: In constructor ‘graph::class_def_size_estimator_t::class_def_size_estimator_t(It)’: thirdparty/harfbuzz/src/graph/classdef-graph.hh:155:44: error: ‘struct hb_hashmap_t<unsigned int, hb_set_t>’ has no member named ‘keys’ for (unsigned klass : glyphs_per_class.keys ()) ^ thirdparty/harfbuzz/src/graph/classdef-graph.hh: In member function ‘bool graph::class_def_size_estimator_t::in_error()’: thirdparty/harfbuzz/src/graph/classdef-graph.hh:200:47: error: ‘struct hb_hashmap_t<unsigned int, hb_set_t>’ has no member named ‘values’ for (const hb_set_t& s : glyphs_per_class.values ()) ^ In file included from thirdparty/harfbuzz/src/graph/gsubgpos-context.cc:27:0: thirdparty/harfbuzz/src/graph/gsubgpos-graph.hh: In member function ‘void graph::Lookup::fix_existing_subtable_links(graph::gsubgpos_graph_context_t&, unsigned int, hb_vector_t<hb_pair_t<unsigned int, hb_vector_t<unsigned int> > >&)’: thirdparty/harfbuzz/src/graph/gsubgpos-graph.hh:259:28: error: ‘struct hb_serialize_context_t::object_t’ has no member named ‘all_links_writer’ for (auto& l : v.obj.all_links_writer ()) ^ thirdparty/harfbuzz/src/graph/gsubgpos-context.cc: In member function ‘unsigned int graph::gsubgpos_graph_context_t::num_non_ext_subtables()’: thirdparty/harfbuzz/src/graph/gsubgpos-context.cc:62:25: error: ‘struct hb_hashmap_t<unsigned int, graph::Lookup*>’ has no member named ‘values’ for (auto l : lookups.values ()) ^ In file included from thirdparty/harfbuzz/src/graph/../hb.hh:484:0, from thirdparty/harfbuzz/src/graph/../hb-set.hh:31, from thirdparty/harfbuzz/src/graph/graph.hh:27, from thirdparty/harfbuzz/src/graph/gsubgpos-graph.hh:27, from thirdparty/harfbuzz/src/graph/gsubgpos-context.cc:27: thirdparty/harfbuzz/src/graph/../hb-vector.hh: In instantiation of ‘Type hb_vector_t<Type, sorted>::pop() [with Type = hb_user_data_array_t::hb_user_data_item_t; bool sorted = false]’: thirdparty/harfbuzz/src/graph/../hb-object.hh:127:7: required from ‘void hb_lockable_set_t<item_t, lock_t>::fini(lock_t&) [with item_t = hb_user_data_array_t::hb_user_data_item_t; lock_t = hb_mutex_t]’ thirdparty/harfbuzz/src/graph/../hb-object.hh:185:34: required from here thirdparty/harfbuzz/src/graph/../hb-vector.hh:398:43: error: cannot convert ‘std::remove_reference<hb_user_data_array_t::hb_user_data_item_t&>::type {aka hb_user_data_array_t::hb_user_data_item_t}’ to ‘hb_user_data_key_t*’ in initialization Type v {std::move (arrayZ[length - 1])}; ^ In file included from thirdparty/harfbuzz/src/graph/../hb.hh:481:0, from thirdparty/harfbuzz/src/graph/../hb-set.hh:31, from thirdparty/harfbuzz/src/graph/graph.hh:27, from thirdparty/harfbuzz/src/graph/gsubgpos-graph.hh:27, from thirdparty/harfbuzz/src/graph/gsubgpos-context.cc:27: thirdparty/harfbuzz/src/graph/../hb-iter.hh: In instantiation of ‘void hb_copy(S&&, D&&) [with S = const hb_hashmap_t<unsigned int, unsigned int, true>&; D = hb_hashmap_t<unsigned int, unsigned int, true>&]’: thirdparty/harfbuzz/src/graph/../hb-map.hh:46:100: required from ‘hb_hashmap_t<K, V, minus_one>::hb_hashmap_t(const hb_hashmap_t<K, V, minus_one>&) [with K = unsigned int; V = unsigned int; bool minus_one = true]’ thirdparty/harfbuzz/src/graph/../hb-map.hh:444:56: required from here thirdparty/harfbuzz/src/graph/../hb-iter.hh:1016:14: error: no match for call to ‘(const<anonymous struct>) (const hb_hashmap_t<unsigned int, unsigned int, true>&)’ hb_iter (is) | hb_sink (id); ^ thirdparty/harfbuzz/src/graph/../hb-iter.hh:156:1: note: candidates are: { ^ thirdparty/harfbuzz/src/graph/../hb-iter.hh:158:3: note: template<class T> hb_iter_type<T><anonymous struct>::operator()(T&&) const operator () (T&& c) const ^ thirdparty/harfbuzz/src/graph/../hb-iter.hh:158:3: note: template argument deduction/substitution failed: thirdparty/harfbuzz/src/graph/../hb-iter.hh:164:3: note: template<class Type> hb_array_t<Type><anonymous struct>::operator()(Type*, unsigned int) const operator () (Type *array, unsigned int length) const ^ thirdparty/harfbuzz/src/graph/../hb-iter.hh:164:3: note: template argument deduction/substitution failed: thirdparty/harfbuzz/src/graph/../hb-iter.hh:1016:14: note: mismatched types ‘Type*’ and ‘hb_hashmap_t<unsigned int, unsigned int, true>’ hb_iter (is) | hb_sink (id); ^ thirdparty/harfbuzz/src/graph/../hb-iter.hh:168:3: note: template<class Type, unsigned int length> hb_array_t<Type><anonymous struct>::operator()(Type (&)[length]) const operator () (Type (&array)[length]) const ^ thirdparty/harfbuzz/src/graph/../hb-iter.hh:168:3: note: template argument deduction/substitution failed: thirdparty/harfbuzz/src/graph/../hb-iter.hh:1016:14: note: mismatched types ‘Type [length]’ and ‘const hb_hashmap_t<unsigned int, unsigned int, true>’ hb_iter (is) | hb_sink (id); ^ In file included from thirdparty/harfbuzz/src/graph/../hb-serialize.hh:36:0, from thirdparty/harfbuzz/src/graph/../hb-machinery.hh:37, from thirdparty/harfbuzz/src/graph/../hb-bit-set.hh:33, from thirdparty/harfbuzz/src/graph/../hb-bit-set-invertible.hh:32, from thirdparty/harfbuzz/src/graph/../hb-set.hh:32, from thirdparty/harfbuzz/src/graph/graph.hh:27, from thirdparty/harfbuzz/src/graph/gsubgpos-graph.hh:27, from thirdparty/harfbuzz/src/graph/gsubgpos-context.cc:27: thirdparty/harfbuzz/src/graph/../hb-map.hh: In instantiation of ‘uint32_t hb_hashmap_t<K, V, minus_one>::hash() const [with K = unsigned int; V = unsigned int; bool minus_one = true; uint32_t = unsigned int]’: thirdparty/harfbuzz/src/graph/../hb-algs.hh:237:43: required from ‘constexpr hb_head_t<unsigned int, decltype (hb_deref(v).hash())><anonymous struct>::impl(const T&, hb_priority<1u>) const [with T = hb::shared_ptr<hb_map_t>; hb_head_t<unsigned int, decltype (hb_deref(v).hash())> = unsigned int]’ thirdparty/harfbuzz/src/graph/../hb-algs.hh:245:3: required by substitution of ‘template<class T> constexpr hb_head_t<unsigned int, decltype (((const<anonymous struct>*)this)-><anonymous struct>::impl(v, hb_priority<16u>()))><anonymous struct>::operator()(const T&) const [with T = hb::shared_ptr<hb_map_t>]’ thirdparty/harfbuzz/src/graph/../hb-map.hh:257:50: required from ‘bool hb_hashmap_t<K, V, minus_one>::has(K, VV**) const [with VV = unsigned int; K = hb::shared_ptr<hb_map_t>; V = unsigned int; bool minus_one = false]’ thirdparty/harfbuzz/src/graph/../hb-ot-layout-common.hh:3034:36: required from here thirdparty/harfbuzz/src/graph/../hb-map.hh:291:19: error: ‘iter_items’ was not declared in this scope + iter_items () ^ thirdparty/harfbuzz/src/graph/../hb-map.hh: In instantiation of ‘bool hb_hashmap_t<K, V, minus_one>::is_equal(const hb_hashmap_t<K, V, minus_one>&) const [with K = unsigned int; V = unsigned int; bool minus_one = true]’: thirdparty/harfbuzz/src/graph/../hb-map.hh:306:78: required from ‘bool hb_hashmap_t<K, V, minus_one>::operator==(const hb_hashmap_t<K, V, minus_one>&) const [with K = unsigned int; V = unsigned int; bool minus_one = true]’ thirdparty/harfbuzz/src/graph/../hb-map.hh:96:65: required from ‘bool hb_hashmap_t<K, V, minus_one>::item_t::operator==(const K&) const [with K = hb::shared_ptr<hb_map_t>; V = unsigned int; bool minus_one = false]’ thirdparty/harfbuzz/src/graph/../hb-map.hh:258:33: required from ‘bool hb_hashmap_t<K, V, minus_one>::has(K, VV**) const [with VV = unsigned int; K = hb::shared_ptr<hb_map_t>; V = unsigned int; bool minus_one = false]’ thirdparty/harfbuzz/src/graph/../hb-ot-layout-common.hh:3034:36: required from here thirdparty/harfbuzz/src/graph/../hb-map.hh:300:28: error: ‘iter’ was not declared in this scope for (auto pair : iter ()) ^ make: *** [build/release/thirdparty/harfbuzz/src/graph/gsubgpos-context.o] Error 1
尝试多个版本都会出现上面的错误,或者会提示不支持 c++17 标准,直接搜索错误多数解决方案都是升级 gcc 编译器,这尼玛,yum 不支持,源码安装又是一堆依赖,我升级,升级你妹。
尝试降级 mupdf 版本,终于经过多次尝试之后发现1.12 版本是可以安装的。
install -d /usr/local/include/mupdf install -d /usr/local/include/mupdf/fitz install -d /usr/local/include/mupdf/pdf install include/mupdf/*.h /usr/local/include/mupdf install include/mupdf/fitz/*.h /usr/local/include/mupdf/fitz install include/mupdf/pdf/*.h /usr/local/include/mupdf/pdf install -d /usr/local/lib install build/release/libmupdf.a build/release/libmupdfthird.a /usr/local/lib install -d /usr/local/bin install build/release/mutool build/release/muraster build/release/mujstest build/release/mjsgen /usr/local/bin install -d /usr/local/share/man/man1 install docs/man/*.1 /usr/local/share/man/man1 install -d /usr/local/share/doc/mupdf install -d /usr/local/share/doc/mupdf/examples install README COPYING CHANGES /usr/local/share/doc/mupdf install docs/*.html docs/*.css docs/*.png /usr/local/share/doc/mupdf install docs/examples/* /usr/local/share/doc/mupdf/examples
来继续 pip,来看着几千行的报错,尼玛,你要炸啊:
fitz/fitz_wrap.c: In function ‘JM_rect_from_py’: fitz/fitz_wrap.c:4042:1: warning: control reaches end of non-void function [-Wreturn-type] } ^ fitz/fitz_wrap.c: In function ‘util_include_point_in_rect’: fitz/fitz_wrap.c:3447:1: warning: control reaches end of non-void function [-Wreturn-type] } ^ fitz/fitz_wrap.c: In function ‘util_transform_point’: fitz/fitz_wrap.c:3461:1: warning: control reaches end of non-void function [-Wreturn-type] } ^ fitz/fitz_wrap.c: In function ‘util_union_rect’: fitz/fitz_wrap.c:3468:1: warning: control reaches end of non-void function [-Wreturn-type] } ^ fitz/fitz_wrap.c: In function ‘util_concat_matrix’: fitz/fitz_wrap.c:3475:1: warning: control reaches end of non-void function [-Wreturn-type] } ^ fitz/fitz_wrap.c: In function ‘JM_matrix_from_py’: fitz/fitz_wrap.c:4131:1: warning: control reaches end of non-void function [-Wreturn-type] } ^ fitz/fitz_wrap.c: In function ‘JM_derotate_page_matrix’: fitz/fitz_wrap.c:5193:1: warning: control reaches end of non-void function [-Wreturn-type] } ^ fitz/fitz_wrap.c: In function ‘JM_irect_from_py’: fitz/fitz_wrap.c:4071:1: warning: control reaches end of non-void function [-Wreturn-type] } ^ error: command 'gcc' failed with exit status 1 ---------------------------------------- ERROR: Command errored out with exit status 1: /usr/bin/python3 -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-kfa4_6i0/pymupdf_d444a7b2e89d4aa38ac652587530e9a2/setup.py'"'"'; __file__='"'"'/tmp/pip-install-kfa4_6i0/pymupdf_d444a7b2e89d4aa38ac652587530e9a2/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-b8m2p6nm/install-record.txt --single-version-externally-managed --compile --install-headers /usr/local/include/python3.6m/PyMuPDF Check the logs for full command output.
尝试降低版本:
[root@iZbp12k4fwg2euy5kkr9u7Z mupdf-1.12.0-source]# pip install PyMuPDF==1.12 Looking in indexes: http://mirrors.cloud.aliyuncs.com/pypi/simple/ ERROR: Could not find a version that satisfies the requirement PyMuPDF==1.12 (from versions: 1.11.2, 1.12.5, 1.13.20, 1.14.19.post2, 1.14.19.2, 1.14.20, 1.14.21, 1.16.0, 1.16.1, 1.16.2, 1.16.3, 1.16.4, 1.16.5, 1.16.6, 1.16.7, 1.16.8, 1.16.8.1, 1.16.9, 1.16.10, 1.16.11, 1.16.12, 1.16.13, 1.16.14, 1.16.15, 1.16.16, 1.16.17, 1.16.18, 1.17.0, 1.17.1, 1.17.2, 1.17.3, 1.17.4, 1.17.5, 1.17.6, 1.17.7, 1.18.0, 1.18.1, 1.18.2, 1.18.3, 1.18.4, 1.18.5, 1.18.6, 1.18.7, 1.18.8, 1.18.9, 1.18.10, 1.18.11, 1.18.12, 1.18.13, 1.18.14, 1.18.15, 1.18.16, 1.18.17, 1.18.18, 1.18.19, 1.19.0, 1.19.1, 1.19.2, 1.19.3, 1.19.4, 1.19.5, 1.19.6) ERROR: No matching distribution found for PyMuPDF==1.12
提示没有 1.12,那就1.12.5:
[root@iZbp12k4fwg2euy5kkr9u7Z mupdf-1.12.0-source]# pip install PyMuPDF==1.12.5 Looking in indexes: http://mirrors.cloud.aliyuncs.com/pypi/simple/ Collecting PyMuPDF==1.12.5 Downloading http://mirrors.cloud.aliyuncs.com/pypi/packages/c1/4a/f6424f019bbc3ac70b55fd589f6b3eb777e13d1a3600dbdb726575d5f5df/PyMuPDF-1.12.5-cp36-cp36m-manylinux1_x86_64.whl (3.4 MB) |████████████████████████████████| 3.4 MB 1.2 MB/s Installing collected packages: PyMuPDF Successfully installed PyMuPDF-1.12.5
nice 终于装上了,启动服务,尝试进行文件拼接,直接报下面的错误:
'Document' object has no attribute 'new_page'
wtf,骇然不让人活了?
尝试升级版本:
[root@iZbp12k4fwg2euy5kkr9u7Z mupdf-1.12.0-source]# pip install PyMuPDF==1.18.19 Looking in indexes: http://mirrors.cloud.aliyuncs.com/pypi/simple/ Collecting PyMuPDF==1.18.19 Downloading http://mirrors.cloud.aliyuncs.com/pypi/packages/d8/b6/59c001fa851ec4ad216232bc256b9aaff67ff9cf1c4bb542f68f1ad5fcd8/PyMuPDF-1.18.19-cp36-cp36m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (6.4 MB) |████████████████████████████████| 6.4 MB 1.4 MB/s Installing collected packages: PyMuPDF Attempting uninstall: PyMuPDF Found existing installation: PyMuPDF 1.12.5 Uninstalling PyMuPDF-1.12.5: Successfully uninstalled PyMuPDF-1.12.5 Successfully installed PyMuPDF-1.18.19
世界终于清净了:
总结:
1. mupdf 源码安装选择mupdf-1.12.0 https://mupdf.com/downloads/archive/mupdf-1.20.0-source.tar.gz 2. pip 安装选择1.18.19 pip install PyMuPDF==1.18.19
后记:
刚才尝试将 centos 的 python 升级为 3.8.6 之后,pymupdf 貌似能正常安装新版本。这尼玛,系统自带的这一堆低版本垃圾:
Successfully installed Babel-2.14.0 Jinja2-3.1.3 MarkupSafe-2.1.5 PyMuPDF-1.24.9 PyMuPDFb-1.24.9 PyPDF2-3.0.1 Pygments-2.18.0 SecretStorage-3.3.3 SimpleWebSocketServer-0.1.2 aliyun-python-sdk-core-2.14.0 aliyun-python-sdk-imm-1.24.0 aliyun-python-sdk-kms-2.16.2 backports.tarfile-1.2.0 certifi-2024.2.2 cffi-1.17.0 charset-normalizer-3.3.2 ci-info-0.3.0 click-8.1.7 configobj-5.0.8 configparser-7.1.0 contourpy-1.1.1 crcmod-1.7 cryptography-42.0.4 cycler-0.12.1 docutils-0.20.1 docxcompose-1.4.0 docxtpl-0.16.7 etelemetry-0.3.1 filelock-3.15.4 fonttools-4.53.1 fsspec-2024.6.1 httplib2-0.22.0 idna-3.6 importlib-metadata-8.4.0 importlib-resources-6.4.3 isodate-0.6.1 jaraco.classes-3.4.0 jaraco.context-6.0.1 jaraco.functools-4.0.2 jeepney-0.8.0 jmespath-0.10.0 keyring-25.3.0 kiwisolver-1.4.5 looseversion-1.3.0 lxml-5.1.0 markdown-it-py-3.0.0 matplot-0.1.9 matplotlib-3.7.5 mdurl-0.1.2 more-itertools-10.4.0 mpmath-1.3.0 networkx-3.1 nh3-0.2.18 numpy-1.24.4 nvidia-cublas-cu12-12.1.3.1 nvidia-cuda-cupti-cu12-12.1.105 nvidia-cuda-nvrtc-cu12-12.1.105 nvidia-cuda-runtime-cu12-12.1.105 nvidia-cudnn-cu12-9.1.0.70 nvidia-cufft-cu12-11.0.2.54 nvidia-curand-cu12-10.3.2.106 nvidia-cusolver-cu12-11.4.5.107 nvidia-cusparse-cu12-12.1.0.106 nvidia-nccl-cu12-2.20.5 nvidia-nvjitlink-cu12-12.6.20 nvidia-nvtx-cu12-12.1.105 opencv-python-4.10.0.84 oss2-2.18.4 packaging-24.1 pandas-2.0.3 pathlib-1.0.1 pillow-10.4.0 pkginfo-1.11.1 pycparser-2.21 pycryptodome-3.20.0 pydot-3.0.1 pyloco-0.0.139 pyparsing-3.1.2 python-dateutil-2.9.0.post0 python-docx-1.1.0 pytz-2024.1 pyxnat-1.6.2 rdflib-6.3.2 readme-renderer-43.0 requests-2.32.3 requests-toolbelt-1.0.0 rfc3986-2.0.0 rich-13.7.1 scipy-1.10.1 simplejson-3.19.3 six-1.16.0 sympy-1.13.2 torch-2.4.0 traits-6.3.2 triton-3.0.0 twine-5.1.1 typing-3.7.4.3 typing-extensions-4.9.0 tzdata-2024.1 urllib3-2.2.2 ushlex-0.99.1 websocket-client-1.8.0 zipp-3.20.0
将多个图片合并为 PDF
某个业务需要让用户下载文件盖章之后重新上传盖章版本,但是现在有个问题那就是操作基本都在手机端,通过手机端上传 pdf 的确是个问题。所以目前的方案是上传盖章版之后的图片。
然鹅,这个方法用户表示略微有点蛋疼,有的需要上传几十张图片,这些盖章的图片重新下载之后管理也是个问题。那个是哪个根本分不清楚,并且要想根据业务编号来管理盖章版文件也是个问题。
所以,就给出了一个方案,将上传的 图片重新转换为 pdf。
鉴于图片是放在 oss 上的,oss 本身倒是提供了图片转 pdf 的方法(https://help.aliyun.com/zh/imm/user-guide/convert-an-image-to-pdf):
# -*- coding: utf-8 -*- # This file is auto-generated, don't edit it. Thanks. import sys import os from typing import List from alibabacloud_imm20200930.client import Client as imm20200930Client from alibabacloud_tea_openapi import models as open_api_models from alibabacloud_imm20200930 import models as imm_20200930_models from alibabacloud_tea_util import models as util_models from alibabacloud_tea_util.client import Client as UtilClient class Sample: def __init__(self): pass @staticmethod def create_client( access_key_id: str, access_key_secret: str, ) -> imm20200930Client: """ 使用AccessKey ID&AccessKey Secret初始化账号Client。 @param access_key_id: @param access_key_secret: @return: Client @throws Exception """ config = open_api_models.Config( access_key_id=access_key_id, access_key_secret=access_key_secret ) # 填写访问的IMM域名。 config.endpoint = f'imm.cn-zhangjiakou.aliyuncs.com' return imm20200930Client(config) @staticmethod def main( args: List[str], ) -> None: # 阿里云账号AccessKey拥有所有API的访问权限,建议您使用RAM用户进行API访问或日常运维。 # 强烈建议不要把AccessKey ID和AccessKey Secret保存到工程代码里,否则可能导致AccessKey泄露,威胁您账号下所有资源的安全。 # 本示例通过从环境变量中读取AccessKey,来实现API访问的身份验证。如何配置环境变量,请参见https://help.aliyun.com/document_detail/2361894.html。 imm_access_key_id = os.getenv("AccessKeyId") imm_access_key_secret = os.getenv("AccessKeySecret") client = Sample.create_client(imm_access_key_id, imm_access_key_secret) sources_0 = imm_20200930_models.CreateImageToPDFTaskRequestSources( uri='oss://test-bucket/test-object.jpg' ) create_image_to_pdftask_request = imm_20200930_models.CreateImageToPDFTaskRequest( project_name='test-project', target_uri='oss://test-bucket/test-target-object.pdf', sources=[ sources_0 ] ) runtime = util_models.RuntimeOptions() try: # 复制代码运行请自行打印API的返回值。 client.create_image_to_pdftask_with_options(create_image_to_pdftask_request, runtime) except Exception as error: # 如有需要,请打印错误信息。 UtilClient.assert_as_string(error.message) @staticmethod async def main_async( args: List[str], ) -> None: # 阿里云账号AccessKey拥有所有API的访问权限,建议您使用RAM用户进行API访问或日常运维。 # 强烈建议不要把AccessKey ID和AccessKey Secret保存到工程代码里,否则可能导致AccessKey泄露,威胁您账号下所有资源的安全。 # 本示例通过从环境变量中读取AccessKey,来实现API访问的身份验证。如何配置环境变量,请参见https://help.aliyun.com/document_detail/2361894.html。 imm_access_key_id = os.getenv("AccessKeyId") imm_access_key_secret = os.getenv("AccessKeySecret") client = Sample.create_client(imm_access_key_id, imm_access_key_secret) sources_0 = imm_20200930_models.CreateImageToPDFTaskRequestSources( uri='oss://test-bucket/test-object.jpg' ) create_image_to_pdftask_request = imm_20200930_models.CreateImageToPDFTaskRequest( project_name='test-project', target_uri='oss://test-bucket/test-target-object.pdf', sources=[ sources_0 ] ) runtime = util_models.RuntimeOptions() try: # 复制代码运行请自行打印API的返回值。 await client.create_image_to_pdftask_with_options_async(create_image_to_pdftask_request, runtime) except Exception as error: # 如有需要,请打印错误信息。 UtilClient.assert_as_string(error.message) if __name__ == '__main__': Sample.main(sys.argv[1:])
然而,项目里面已经引入了比较旧的 aliyun 的 sdk。这个新的再引用之后就需要修改之前的代码,这也就蛋疼了。
网上搜了一下,代码不少,但是不好用啊,这尼玛,就没人写个靠谱的代码吗?
最终通过PyMuPDF来解决了这个问题:
import fitz # PyMuPDF # Open an existing PDF or create a new one pdf_document = fitz.open() # Creates a new PDF # Define the image file path image_path = "path/to/your/image.jpg" # Get the dimensions of the image img = fitz.open(image_path) img_rect = img[0].rect # Get the rectangle of the first page of the image # Create a new page with the same dimensions as the image pdf_page = pdf_document.new_page(width=img_rect.width, height=img_rect.height) # Insert the image into the new page pdf_page.insert_image(pdf_page.rect, filename=image_path) # Save the PDF to a file pdf_document.save("output.pdf") pdf_document.close()
实际的业务代码:
def converImageToPdf(img_list): # pdf = fitz.open() # PyMuPDF pdf_document = fitz.open() # Creates a new PDF for img_url in img_list: img_local_file = download_image(img_url,'confirmd_images') img = fitz.open(img_local_file) img_rect = img[0].rect # Get the rectangle of the first page of the image # Create a new page with the same dimensions as the image pdf_page = pdf_document.new_page(width=img_rect.width, height=img_rect.height) # Insert the image into the new page pdf_page.insert_image(pdf_page.rect, filename=img_local_file) img.close() file_name = random_file_name('pdf') if not os.path.exists('confirmd_receipt'): os.mkdir('confirmd_receipt') pdf_document.save(os.path.join('confirmd_receipt/') + file_name) pdf_document.close()
实际效果:
依赖:
PyMuPDFb == 1.24.9
利用PDF.js在微信小程序里预览PDF文件
在微信小程序可以通过wx.downloadFile 和 wx.openDocument 两个api下载并打开pdf文件。这种方式主要有不少的缺点:
1、需要下载才可以查看,且每次打开都需要下载生成一个临时文件,如果PDF文件比较多的话,临时文件会越来越多,且如果PDF文件比较大的话,打开会比较慢。
2、在导航栏显示标题是临时文件名,看上去不够优雅。
3、翻页不方便。
那PDF能不能在小程序直接预览呢?我尝试用微信小程序的web-view里显示PDF的文件,在开发工具里可以显示,但在真机里无法显示。在微信开放社区看有人用PDF.js在浏览器里打开PDF文件,PDF.js 由 Mozilla 提供支持,目标是创建一个通用的、基于 Web 标准的平台,用于解析和呈现 PDF. 通过web-view方式打开通过PDF.js解析的PDF文件,在微信开发工具里无法正常显示,不过好消息是:在真机里可以显示正常。
使用PDF.js来解析PDF方法如下:
1、去PDF.js官方网站下载此框架:https://mozilla.github.io/pdf.js/getting_started
2、把PDF.js部署到网站,PDF.js有两个文件夹web和build,把这两个文件放到网站的一个目录下比如pdfljs目录,在web目录下有个viewer.html文件,可以用它来在线解析pdf文件,当然pdf文件的链接需要在同一个域名,预览的方式是:
https://wwww.domianname.com/pdfjs/web/viewer.html?file=xxx/xxx/xxx.pdf
在微慕专业版已集成了PDF.js框架,支持通过pdf的链接在浏览器和小程序里预览PDF文件,在微慕专业里体验该功能的效果。
预览pdf文件:https://blog.minapper.com/wp-content/uploads/微慕小程序专业版.pdf
注意以上方式PDF文件的链接所在域名需要设置的小程序业务域名里。对于跨域的链接,虽然也支持,不过需要特别处理,具体详见链接:https://github.com/mozilla/pdf.js/wiki/Frequently-Asked-Questions#faq-xhr
下载打开pdf文件:https://www.watch-life.net/微慕小程序开源版.pdf
以上下载打开PDF文件的方式需要设置业务域名和downloadfile域名。
利用PDF.js在微信小程序里预览PDF文件,支持PDF.js的相关功能,比如:侧栏,查找,分页,缩放,添加文字,绘图,旋转,演示模式等。
上面是通过官方viewer.html来显示PDF文件,也可以通过引入PDF.js的方式来解析和显示,这个方式就可以自定义功能。方法如下:
1、引入pdf.js库
<script src="./build/pdf.js"></script>
<script src="./build//pdf.worker.js"></script>
2、用canvas接收需要读取到的pdf内容并显示
<canvas id="myCanvas"></canvas>
3、创建PDF对象:data可以是pdf文件对应的Base64字符串,也可以是文件所在相对或者绝对路径,也可以是一个在线文件url地址
var loadingTask = pdfjsLib.getDocument(data)
loadingTask.promise.then(function (pdf) {
for (var i = 1; i
有关PDF.js的更多信息,可以参考官方网站:https://mozilla.github.io/pdf.js/
The post 利用PDF.js在微信小程序里预览PDF文件 first appeared on 守望轩.