普通视图

发现新文章,点击刷新页面。
昨天以前首页

再谈 Python自动生成 pdf 文件

2025年3月5日 17:25

pdf 这个东西,不得不说,真的是受人欢迎,任何时候下载个文件或者报告之类的,都想弄个 pdf 版本。当然,这个东西的好处是不管在哪里看,样式基本都是一样的。

然而,缺点也很明显,没有办法直接生成 pdf 文件,当然,通过各种库可以直接将图片转为 pdf。然而,对于复杂格式或者需要使用模板来创建 pdf 的时候,就变得有些麻烦了。

def converImageToPdf(img_list):
    pdf = fitz.open() PyMuPDF
    pdf_document = fitz.open()  Creates a new PDF
    #遍历图片文件夹中的所有图片文件
    for img_url in img_list:
        img_local_file = download_image(img_url, 'confirmd_images')
        img_path = os.path.join(img_folder, img_file)
        img = fitz.open(img_local_file)
        img_rect = img[0].rect  Get the rectangle of the first page of the image
    #Create a new page with the same dimensions as the image
        pdf_page = pdf_document.new_page(width=img_rect.width, height=img_rect.height)
    #Insert the image into the new page
        pdf_page.insert_image(pdf_page.rect, filename=img_local_file)
    #保存PDF文件
        img.close()
    
    file_name = random_file_name('pdf')
    if not os.path.exists('confirmd_receipt'):
        os.mkdir('confirmd_receipt')
    pdf_document.save(os.path.join('confirmd_receipt/') + file_name)
    pdf_document.close()

依赖于fitz

pip install fitz

之前写过基于 oss 的:

Python生成Pdf报告

那么没有 oss 呢?其实此时最简单的办法就是基于 liboffice 了。

# 安装 LibreOffice(Ubuntu/Debian)
sudo apt-get install libreoffice

# 验证安装
libreoffice --version

代码:

import subprocess
import os

def convert_to_pdf(input_docx, output_dir):
    try:
        # 创建输出目录(如果不存在)
        os.makedirs(output_dir, exist_ok=True)
        
        # 执行转换命令
        cmd = [
            'libreoffice', '--headless', '--convert-to', 'pdf',
            '--outdir', output_dir, input_docx
        ]
        result = subprocess.run(cmd, check=True, capture_output=True, text=True)
        
        print(f"转换成功: {input_docx} → {output_dir}")
        return True
    except subprocess.CalledProcessError as e:
        print(f"转换失败: {e.stderr}")
        return False
    except Exception as e:
        print(f"发生错误: {str(e)}")
        return False

# 使用示例
convert_to_pdf(
    input_docx="/path/to/document.docx",
    output_dir="/path/to/output"
)

不多此时大概率得到的 PDF 文件会是乱码:

这一堆框就很专业,应该是没有字体导致的,安装字体文件:

# Ubuntu/Debian
sudo apt-get install fonts-wqy-zenhei fonts-wqy-microhei fonts-noto-cjk

# CentOS/RHEL
sudo yum install wqy-zenhei-fonts wqy-microhei-fonts google-noto-cjk-fonts

# 刷新字体缓存
sudo fc-cache -fv

验证安装:

# 查看已安装的中文字体
fc-list :lang=zh | grep -E "WenQuanYi|Noto"

# 预期输出示例(显示已安装字体路径):
# /usr/share/fonts/truetype/wqy/wqy-zenhei.ttc: WenQuanYi Zen Hei,文泉驛正黑:style=Regular
# /usr/share/fonts/opentype/noto/NotoSansCJK-Regular.ttc: Noto Sans CJK JP,Noto Sans CJK JP Regular:style=Regular

如果依然有问题,那就修改系统配置:

# 检查当前 locale
locale

# 生成中文环境配置(如未安装)
sudo locale-gen zh_CN.UTF-8

# 临时设置环境变量(测试用)
export LANG=zh_CN.UTF-8

# 永久设置(修改 /etc/default/locale)
sudo nano /etc/default/locale
# 添加内容:
LANG="zh_CN.UTF-8"
LC_ALL="zh_CN.UTF-8"

或者转换的之后指定字体:

def convert_with_font(input_docx, output_dir):
    cmd = [
        'libreoffice', '--headless',
        '--env:UserInstallation=file:///tmp/libreoffice-altprofile', # 使用独立配置
        '--convert-to', 'pdf:writer_pdf_Export:{"Watermark":{"type":"string","value":" "},'
                         '"SelectPdfVersion":{"type":"long","value":"1"},'
                         '"UseTaggedPDF":{"type":"boolean","value":"true"},'
                         '"ExportBookmarks":{"type":"boolean","value":"true"},'
                         '"EmbedStandardFonts":{"type":"boolean","value":"true"}}',
        '--outdir', output_dir,
        input_docx
    ]
    subprocess.run(cmd, check=True)

此时多数就能解决问题了:

Centos 7 安装PyMuPDF

2024年8月21日 10:38

接引前文,昨天把代码写好测试 ok 之后,以为就万事大吉了。然而,今天往服务器上部署的时候,直接给整麻了。问题一个接一个,错误一堆接一堆。直接让人破防了。

对于 linux 的发行版,我并没有神马偏见,主要是用过的版本也不多,但是,不得不说那个 centos 是真烂,也不知道为啥那么多人喜欢用这个破系统。

直接 pip 安装,好嘛,这一堆错误:

[root@iZbp12k4fwg2euy5kkr9u7Z ~]# pip install PyMuPDF
Looking in indexes: http://mirrors.cloud.aliyuncs.com/pypi/simple/
Collecting PyMuPDF
  Using cached http://mirrors.cloud.aliyuncs.com/pypi/packages/9f/1d/032d24e0c774e67742395fda163a172c60e4d0f9875785d5199eb2956d5e/PyMuPDF-1.19.6.tar.gz (2.3 MB)
  Preparing metadata (setup.py) ... done
Using legacy 'setup.py install' for PyMuPDF, since package 'wheel' is not installed.
Installing collected packages: PyMuPDF
    Running setup.py install for PyMuPDF ... error
    ERROR: Command errored out with exit status 1:
     command: /usr/bin/python3 -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-8aqc9a1k/pymupdf_d5ebd12caf9445ab82d6a5af68229d72/setup.py'"'"'; __file__='"'"'/tmp/pip-install-8aqc9a1k/pymupdf_d5ebd12caf9445ab82d6a5af68229d72/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-7fhkepkr/install-record.txt --single-version-externally-managed --compile --install-headers /usr/local/include/python3.6m/PyMuPDF
         cwd: /tmp/pip-install-8aqc9a1k/pymupdf_d5ebd12caf9445ab82d6a5af68229d72/
    Complete output (20 lines):
    running install
    running build
    running build_py
    creating build
    creating build/lib.linux-x86_64-3.6
    creating build/lib.linux-x86_64-3.6/fitz
    copying fitz/__init__.py -> build/lib.linux-x86_64-3.6/fitz
    copying fitz/fitz.py -> build/lib.linux-x86_64-3.6/fitz
    copying fitz/utils.py -> build/lib.linux-x86_64-3.6/fitz
    copying fitz/__main__.py -> build/lib.linux-x86_64-3.6/fitz
    running build_ext
    building 'fitz._fitz' extension
    creating build/temp.linux-x86_64-3.6
    creating build/temp.linux-x86_64-3.6/fitz
    gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -D_GNU_SOURCE -fPIC -fwrapv -fPIC -I/usr/include/mupdf -I/usr/local/include/mupdf -Imupdf/thirdparty/freetype/include -I/usr/include/freetype2 -I/usr/include/python3.6m -c fitz/fitz_wrap.c -o build/temp.linux-x86_64-3.6/fitz/fitz_wrap.o
    fitz/fitz_wrap.c:2755:18: fatal error: fitz.h: No such file or directory
     #include <fitz.h>
                      ^
    compilation terminated.
    error: command 'gcc' failed with exit status 1
    ----------------------------------------
ERROR: Command errored out with exit status 1: /usr/bin/python3 -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-8aqc9a1k/pymupdf_d5ebd12caf9445ab82d6a5af68229d72/setup.py'"'"'; __file__='"'"'/tmp/pip-install-8aqc9a1k/pymupdf_d5ebd12caf9445ab82d6a5af68229d72/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-7fhkepkr/install-record.txt --single-version-externally-managed --compile --install-headers /usr/local/include/python3.6m/PyMuPDF Check the logs for full command output.

按照提示看来是 gcc 报错了,错误原因是没有头文件,一通搜索:https://blog.csdn.net/u012140499/article/details/112798704 提供了解决思路,下载源码https://casper.mupdf.com/releases/安装。

直接下载最新版编译,又是一堆报错:

source/fitz/util.c: In function ‘fz_new_xhtml_document_from_document’:
source/fitz/util.c:866:2: warning: ‘new_doc’ may be used uninitialized in this function [-Wmaybe-uninitialized]
  return new_doc;
  ^
    CC build/release/source/fitz/warp.o
    CC build/release/source/fitz/writer.o
source/fitz/writer.c: In function ‘fz_new_document_writer_with_buffer’:
source/fitz/writer.c:305:2: warning: ‘wri’ may be used uninitialized in this function [-Wmaybe-uninitialized]
  return wri;
  ^
    CC build/release/source/fitz/xml.o
    CC build/release/source/fitz/xmltext-device.o
    CC build/release/source/fitz/zip.o
    CXX build/release/source/fitz/tessocr.o
/bin/sh: g++: command not found
make: *** [build/release/source/fitz/tessocr.o] Error 127

提示找不到 g++,嗯,再来解决 g++

yum search "gcc-c++"

就一个结果:

oaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
 * base: mirrors.cloud.aliyuncs.com
 * extras: mirrors.cloud.aliyuncs.com
 * updates: mirrors.cloud.aliyuncs.com
======================================================================================================= N/S matched: gcc-c++ =======================================================================================================
gcc-c++.x86_64 : C++ support for GCC

  Name and summary matches only, use "search all" for everything.

安装 g++:

[root@iZbp12k4fwg2euy5kkr9u7Z mupdf-1.22.0-source]# yum install "gcc-c++.x86_64" -y
Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
 * base: mirrors.cloud.aliyuncs.com
 * extras: mirrors.cloud.aliyuncs.com
 * updates: mirrors.cloud.aliyuncs.com
Resolving Dependencies
--> Running transaction check
---> Package gcc-c++.x86_64 0:4.8.5-44.el7 will be installed
--> Finished Dependency Resolution

Dependencies Resolved

====================================================================================================================================================================================================================================
 Package                                                Arch                                                  Version                                                     Repository                                           Size
====================================================================================================================================================================================================================================
Installing:
 gcc-c++                                                x86_64                                                4.8.5-44.el7                                                base                                                7.2 M

Transaction Summary
====================================================================================================================================================================================================================================
Install  1 Package

Total download size: 7.2 M
Installed size: 16 M
Downloading packages:
gcc-c++-4.8.5-44.el7.x86_64.rpm                                                                                                                                                                              | 7.2 MB  00:00:00     
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
  Installing : gcc-c++-4.8.5-44.el7.x86_64                                                                                                                                                                                      1/1 
  Verifying  : gcc-c++-4.8.5-44.el7.x86_64                                                                                                                                                                                      1/1 

Installed:
  gcc-c++.x86_64 0:4.8.5-44.el7                                                                                                                                                                                                     

Complete!

再来一遍:

make HAVE_X11=no HAVE_GLUT=no prefix=/usr/local install

编译安装命令参考这个链接:https://mupdf.readthedocs.io/en/latest/quick-start-guide.html#linux

几百行错误出来了:

thirdparty/harfbuzz/src/graph/../hb-meta.hh:76:41: note: in definition of macro ‘HB_AUTO_RETURN’
 #define HB_AUTO_RETURN(E) -> decltype ((E)) { return (E); }
                                         ^
In file included from thirdparty/harfbuzz/src/graph/pairpos-graph.hh:32:0,
                 from thirdparty/harfbuzz/src/graph/gsubgpos-graph.hh:31,
                 from thirdparty/harfbuzz/src/graph/gsubgpos-context.cc:27:
thirdparty/harfbuzz/src/graph/classdef-graph.hh: In constructor ‘graph::class_def_size_estimator_t::class_def_size_estimator_t(It)’:
thirdparty/harfbuzz/src/graph/classdef-graph.hh:155:44: error: ‘struct hb_hashmap_t<unsigned int, hb_set_t>’ has no member named ‘keys’
     for (unsigned klass : glyphs_per_class.keys ())
                                            ^
thirdparty/harfbuzz/src/graph/classdef-graph.hh: In member function ‘bool graph::class_def_size_estimator_t::in_error()’:
thirdparty/harfbuzz/src/graph/classdef-graph.hh:200:47: error: ‘struct hb_hashmap_t<unsigned int, hb_set_t>’ has no member named ‘values’
     for (const hb_set_t& s : glyphs_per_class.values ())
                                               ^
In file included from thirdparty/harfbuzz/src/graph/gsubgpos-context.cc:27:0:
thirdparty/harfbuzz/src/graph/gsubgpos-graph.hh: In member function ‘void graph::Lookup::fix_existing_subtable_links(graph::gsubgpos_graph_context_t&, unsigned int, hb_vector_t<hb_pair_t<unsigned int, hb_vector_t<unsigned int> > >&)’:
thirdparty/harfbuzz/src/graph/gsubgpos-graph.hh:259:28: error: ‘struct hb_serialize_context_t::object_t’ has no member named ‘all_links_writer’
       for (auto& l : v.obj.all_links_writer ())
                            ^
thirdparty/harfbuzz/src/graph/gsubgpos-context.cc: In member function ‘unsigned int graph::gsubgpos_graph_context_t::num_non_ext_subtables()’:
thirdparty/harfbuzz/src/graph/gsubgpos-context.cc:62:25: error: ‘struct hb_hashmap_t<unsigned int, graph::Lookup*>’ has no member named ‘values’
   for (auto l : lookups.values ())
                         ^
In file included from thirdparty/harfbuzz/src/graph/../hb.hh:484:0,
                 from thirdparty/harfbuzz/src/graph/../hb-set.hh:31,
                 from thirdparty/harfbuzz/src/graph/graph.hh:27,
                 from thirdparty/harfbuzz/src/graph/gsubgpos-graph.hh:27,
                 from thirdparty/harfbuzz/src/graph/gsubgpos-context.cc:27:
thirdparty/harfbuzz/src/graph/../hb-vector.hh: In instantiation of ‘Type hb_vector_t<Type, sorted>::pop() [with Type = hb_user_data_array_t::hb_user_data_item_t; bool sorted = false]’:
thirdparty/harfbuzz/src/graph/../hb-object.hh:127:7:   required from ‘void hb_lockable_set_t<item_t, lock_t>::fini(lock_t&) [with item_t = hb_user_data_array_t::hb_user_data_item_t; lock_t = hb_mutex_t]’
thirdparty/harfbuzz/src/graph/../hb-object.hh:185:34:   required from here
thirdparty/harfbuzz/src/graph/../hb-vector.hh:398:43: error: cannot convert ‘std::remove_reference<hb_user_data_array_t::hb_user_data_item_t&>::type {aka hb_user_data_array_t::hb_user_data_item_t}’ to ‘hb_user_data_key_t*’ in initialization
     Type v {std::move (arrayZ[length - 1])};
                                           ^
In file included from thirdparty/harfbuzz/src/graph/../hb.hh:481:0,
                 from thirdparty/harfbuzz/src/graph/../hb-set.hh:31,
                 from thirdparty/harfbuzz/src/graph/graph.hh:27,
                 from thirdparty/harfbuzz/src/graph/gsubgpos-graph.hh:27,
                 from thirdparty/harfbuzz/src/graph/gsubgpos-context.cc:27:
thirdparty/harfbuzz/src/graph/../hb-iter.hh: In instantiation of ‘void hb_copy(S&&, D&&) [with S = const hb_hashmap_t<unsigned int, unsigned int, true>&; D = hb_hashmap_t<unsigned int, unsigned int, true>&]’:
thirdparty/harfbuzz/src/graph/../hb-map.hh:46:100:   required from ‘hb_hashmap_t<K, V, minus_one>::hb_hashmap_t(const hb_hashmap_t<K, V, minus_one>&) [with K = unsigned int; V = unsigned int; bool minus_one = true]’
thirdparty/harfbuzz/src/graph/../hb-map.hh:444:56:   required from here
thirdparty/harfbuzz/src/graph/../hb-iter.hh:1016:14: error: no match for call to ‘(const<anonymous struct>) (const hb_hashmap_t<unsigned int, unsigned int, true>&)’
   hb_iter (is) | hb_sink (id);
              ^
thirdparty/harfbuzz/src/graph/../hb-iter.hh:156:1: note: candidates are:
 {
 ^
thirdparty/harfbuzz/src/graph/../hb-iter.hh:158:3: note: template<class T> hb_iter_type<T><anonymous struct>::operator()(T&&) const
   operator () (T&& c) const
   ^
thirdparty/harfbuzz/src/graph/../hb-iter.hh:158:3: note:   template argument deduction/substitution failed:
thirdparty/harfbuzz/src/graph/../hb-iter.hh:164:3: note: template<class Type> hb_array_t<Type><anonymous struct>::operator()(Type*, unsigned int) const
   operator () (Type *array, unsigned int length) const
   ^
thirdparty/harfbuzz/src/graph/../hb-iter.hh:164:3: note:   template argument deduction/substitution failed:
thirdparty/harfbuzz/src/graph/../hb-iter.hh:1016:14: note:   mismatched types ‘Type*’ and ‘hb_hashmap_t<unsigned int, unsigned int, true>’
   hb_iter (is) | hb_sink (id);
              ^
thirdparty/harfbuzz/src/graph/../hb-iter.hh:168:3: note: template<class Type, unsigned int length> hb_array_t<Type><anonymous struct>::operator()(Type (&)[length]) const
   operator () (Type (&array)[length]) const
   ^
thirdparty/harfbuzz/src/graph/../hb-iter.hh:168:3: note:   template argument deduction/substitution failed:
thirdparty/harfbuzz/src/graph/../hb-iter.hh:1016:14: note:   mismatched types ‘Type [length]’ and ‘const hb_hashmap_t<unsigned int, unsigned int, true>’
   hb_iter (is) | hb_sink (id);
              ^
In file included from thirdparty/harfbuzz/src/graph/../hb-serialize.hh:36:0,
                 from thirdparty/harfbuzz/src/graph/../hb-machinery.hh:37,
                 from thirdparty/harfbuzz/src/graph/../hb-bit-set.hh:33,
                 from thirdparty/harfbuzz/src/graph/../hb-bit-set-invertible.hh:32,
                 from thirdparty/harfbuzz/src/graph/../hb-set.hh:32,
                 from thirdparty/harfbuzz/src/graph/graph.hh:27,
                 from thirdparty/harfbuzz/src/graph/gsubgpos-graph.hh:27,
                 from thirdparty/harfbuzz/src/graph/gsubgpos-context.cc:27:
thirdparty/harfbuzz/src/graph/../hb-map.hh: In instantiation of ‘uint32_t hb_hashmap_t<K, V, minus_one>::hash() const [with K = unsigned int; V = unsigned int; bool minus_one = true; uint32_t = unsigned int]’:
thirdparty/harfbuzz/src/graph/../hb-algs.hh:237:43:   required from ‘constexpr hb_head_t<unsigned int, decltype (hb_deref(v).hash())><anonymous struct>::impl(const T&, hb_priority<1u>) const [with T = hb::shared_ptr<hb_map_t>; hb_head_t<unsigned int, decltype (hb_deref(v).hash())> = unsigned int]’
thirdparty/harfbuzz/src/graph/../hb-algs.hh:245:3:   required by substitution of ‘template<class T> constexpr hb_head_t<unsigned int, decltype (((const<anonymous struct>*)this)-><anonymous struct>::impl(v, hb_priority<16u>()))><anonymous struct>::operator()(const T&) const [with T = hb::shared_ptr<hb_map_t>]’
thirdparty/harfbuzz/src/graph/../hb-map.hh:257:50:   required from ‘bool hb_hashmap_t<K, V, minus_one>::has(K, VV**) const [with VV = unsigned int; K = hb::shared_ptr<hb_map_t>; V = unsigned int; bool minus_one = false]’
thirdparty/harfbuzz/src/graph/../hb-ot-layout-common.hh:3034:36:   required from here
thirdparty/harfbuzz/src/graph/../hb-map.hh:291:19: error: ‘iter_items’ was not declared in this scope
     + iter_items ()
                   ^
thirdparty/harfbuzz/src/graph/../hb-map.hh: In instantiation of ‘bool hb_hashmap_t<K, V, minus_one>::is_equal(const hb_hashmap_t<K, V, minus_one>&) const [with K = unsigned int; V = unsigned int; bool minus_one = true]’:
thirdparty/harfbuzz/src/graph/../hb-map.hh:306:78:   required from ‘bool hb_hashmap_t<K, V, minus_one>::operator==(const hb_hashmap_t<K, V, minus_one>&) const [with K = unsigned int; V = unsigned int; bool minus_one = true]’
thirdparty/harfbuzz/src/graph/../hb-map.hh:96:65:   required from ‘bool hb_hashmap_t<K, V, minus_one>::item_t::operator==(const K&) const [with K = hb::shared_ptr<hb_map_t>; V = unsigned int; bool minus_one = false]’
thirdparty/harfbuzz/src/graph/../hb-map.hh:258:33:   required from ‘bool hb_hashmap_t<K, V, minus_one>::has(K, VV**) const [with VV = unsigned int; K = hb::shared_ptr<hb_map_t>; V = unsigned int; bool minus_one = false]’
thirdparty/harfbuzz/src/graph/../hb-ot-layout-common.hh:3034:36:   required from here
thirdparty/harfbuzz/src/graph/../hb-map.hh:300:28: error: ‘iter’ was not declared in this scope
     for (auto pair : iter ())
                            ^
make: *** [build/release/thirdparty/harfbuzz/src/graph/gsubgpos-context.o] Error 1

尝试多个版本都会出现上面的错误,或者会提示不支持 c++17 标准,直接搜索错误多数解决方案都是升级 gcc 编译器,这尼玛,yum 不支持,源码安装又是一堆依赖,我升级,升级你妹。

尝试降级 mupdf 版本,终于经过多次尝试之后发现1.12 版本是可以安装的。

install -d /usr/local/include/mupdf
install -d /usr/local/include/mupdf/fitz
install -d /usr/local/include/mupdf/pdf
install include/mupdf/*.h /usr/local/include/mupdf
install include/mupdf/fitz/*.h /usr/local/include/mupdf/fitz
install include/mupdf/pdf/*.h /usr/local/include/mupdf/pdf
install -d /usr/local/lib
install build/release/libmupdf.a build/release/libmupdfthird.a /usr/local/lib
install -d /usr/local/bin
install build/release/mutool    build/release/muraster   build/release/mujstest build/release/mjsgen /usr/local/bin
install -d /usr/local/share/man/man1
install docs/man/*.1 /usr/local/share/man/man1
install -d /usr/local/share/doc/mupdf
install -d /usr/local/share/doc/mupdf/examples
install README COPYING CHANGES /usr/local/share/doc/mupdf
install docs/*.html docs/*.css docs/*.png /usr/local/share/doc/mupdf
install docs/examples/* /usr/local/share/doc/mupdf/examples

来继续 pip,来看着几千行的报错,尼玛,你要炸啊:

    fitz/fitz_wrap.c: In function ‘JM_rect_from_py’:
    fitz/fitz_wrap.c:4042:1: warning: control reaches end of non-void function [-Wreturn-type]
     }
     ^
    fitz/fitz_wrap.c: In function ‘util_include_point_in_rect’:
    fitz/fitz_wrap.c:3447:1: warning: control reaches end of non-void function [-Wreturn-type]
     }
     ^
    fitz/fitz_wrap.c: In function ‘util_transform_point’:
    fitz/fitz_wrap.c:3461:1: warning: control reaches end of non-void function [-Wreturn-type]
     }
     ^
    fitz/fitz_wrap.c: In function ‘util_union_rect’:
    fitz/fitz_wrap.c:3468:1: warning: control reaches end of non-void function [-Wreturn-type]
     }
     ^
    fitz/fitz_wrap.c: In function ‘util_concat_matrix’:
    fitz/fitz_wrap.c:3475:1: warning: control reaches end of non-void function [-Wreturn-type]
     }
     ^
    fitz/fitz_wrap.c: In function ‘JM_matrix_from_py’:
    fitz/fitz_wrap.c:4131:1: warning: control reaches end of non-void function [-Wreturn-type]
     }
     ^
    fitz/fitz_wrap.c: In function ‘JM_derotate_page_matrix’:
    fitz/fitz_wrap.c:5193:1: warning: control reaches end of non-void function [-Wreturn-type]
     }
     ^
    fitz/fitz_wrap.c: In function ‘JM_irect_from_py’:
    fitz/fitz_wrap.c:4071:1: warning: control reaches end of non-void function [-Wreturn-type]
     }
     ^
    error: command 'gcc' failed with exit status 1
    ----------------------------------------
ERROR: Command errored out with exit status 1: /usr/bin/python3 -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-kfa4_6i0/pymupdf_d444a7b2e89d4aa38ac652587530e9a2/setup.py'"'"'; __file__='"'"'/tmp/pip-install-kfa4_6i0/pymupdf_d444a7b2e89d4aa38ac652587530e9a2/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-b8m2p6nm/install-record.txt --single-version-externally-managed --compile --install-headers /usr/local/include/python3.6m/PyMuPDF Check the logs for full command output.

尝试降低版本:

[root@iZbp12k4fwg2euy5kkr9u7Z mupdf-1.12.0-source]# pip install PyMuPDF==1.12
Looking in indexes: http://mirrors.cloud.aliyuncs.com/pypi/simple/
ERROR: Could not find a version that satisfies the requirement PyMuPDF==1.12 (from versions: 1.11.2, 1.12.5, 1.13.20, 1.14.19.post2, 1.14.19.2, 1.14.20, 1.14.21, 1.16.0, 1.16.1, 1.16.2, 1.16.3, 1.16.4, 1.16.5, 1.16.6, 1.16.7, 1.16.8, 1.16.8.1, 1.16.9, 1.16.10, 1.16.11, 1.16.12, 1.16.13, 1.16.14, 1.16.15, 1.16.16, 1.16.17, 1.16.18, 1.17.0, 1.17.1, 1.17.2, 1.17.3, 1.17.4, 1.17.5, 1.17.6, 1.17.7, 1.18.0, 1.18.1, 1.18.2, 1.18.3, 1.18.4, 1.18.5, 1.18.6, 1.18.7, 1.18.8, 1.18.9, 1.18.10, 1.18.11, 1.18.12, 1.18.13, 1.18.14, 1.18.15, 1.18.16, 1.18.17, 1.18.18, 1.18.19, 1.19.0, 1.19.1, 1.19.2, 1.19.3, 1.19.4, 1.19.5, 1.19.6)
ERROR: No matching distribution found for PyMuPDF==1.12

提示没有 1.12,那就1.12.5:

[root@iZbp12k4fwg2euy5kkr9u7Z mupdf-1.12.0-source]# pip install PyMuPDF==1.12.5
Looking in indexes: http://mirrors.cloud.aliyuncs.com/pypi/simple/
Collecting PyMuPDF==1.12.5
  Downloading http://mirrors.cloud.aliyuncs.com/pypi/packages/c1/4a/f6424f019bbc3ac70b55fd589f6b3eb777e13d1a3600dbdb726575d5f5df/PyMuPDF-1.12.5-cp36-cp36m-manylinux1_x86_64.whl (3.4 MB)
     |████████████████████████████████| 3.4 MB 1.2 MB/s            
Installing collected packages: PyMuPDF
Successfully installed PyMuPDF-1.12.5

nice 终于装上了,启动服务,尝试进行文件拼接,直接报下面的错误:

'Document' object has no attribute 'new_page'

wtf,骇然不让人活了?

尝试升级版本:

[root@iZbp12k4fwg2euy5kkr9u7Z mupdf-1.12.0-source]# pip install PyMuPDF==1.18.19
Looking in indexes: http://mirrors.cloud.aliyuncs.com/pypi/simple/
Collecting PyMuPDF==1.18.19
  Downloading http://mirrors.cloud.aliyuncs.com/pypi/packages/d8/b6/59c001fa851ec4ad216232bc256b9aaff67ff9cf1c4bb542f68f1ad5fcd8/PyMuPDF-1.18.19-cp36-cp36m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (6.4 MB)
     |████████████████████████████████| 6.4 MB 1.4 MB/s            
Installing collected packages: PyMuPDF
  Attempting uninstall: PyMuPDF
    Found existing installation: PyMuPDF 1.12.5
    Uninstalling PyMuPDF-1.12.5:
      Successfully uninstalled PyMuPDF-1.12.5
Successfully installed PyMuPDF-1.18.19

世界终于清净了:

总结:

1. mupdf 源码安装选择mupdf-1.12.0 https://mupdf.com/downloads/archive/mupdf-1.20.0-source.tar.gz
2. pip 安装选择1.18.19 pip install PyMuPDF==1.18.19

后记:

刚才尝试将 centos 的 python 升级为 3.8.6 之后,pymupdf 貌似能正常安装新版本。这尼玛,系统自带的这一堆低版本垃圾:

Successfully installed Babel-2.14.0 Jinja2-3.1.3 MarkupSafe-2.1.5 PyMuPDF-1.24.9 PyMuPDFb-1.24.9 PyPDF2-3.0.1 Pygments-2.18.0 SecretStorage-3.3.3 SimpleWebSocketServer-0.1.2 aliyun-python-sdk-core-2.14.0 aliyun-python-sdk-imm-1.24.0 aliyun-python-sdk-kms-2.16.2 backports.tarfile-1.2.0 certifi-2024.2.2 cffi-1.17.0 charset-normalizer-3.3.2 ci-info-0.3.0 click-8.1.7 configobj-5.0.8 configparser-7.1.0 contourpy-1.1.1 crcmod-1.7 cryptography-42.0.4 cycler-0.12.1 docutils-0.20.1 docxcompose-1.4.0 docxtpl-0.16.7 etelemetry-0.3.1 filelock-3.15.4 fonttools-4.53.1 fsspec-2024.6.1 httplib2-0.22.0 idna-3.6 importlib-metadata-8.4.0 importlib-resources-6.4.3 isodate-0.6.1 jaraco.classes-3.4.0 jaraco.context-6.0.1 jaraco.functools-4.0.2 jeepney-0.8.0 jmespath-0.10.0 keyring-25.3.0 kiwisolver-1.4.5 looseversion-1.3.0 lxml-5.1.0 markdown-it-py-3.0.0 matplot-0.1.9 matplotlib-3.7.5 mdurl-0.1.2 more-itertools-10.4.0 mpmath-1.3.0 networkx-3.1 nh3-0.2.18 numpy-1.24.4 nvidia-cublas-cu12-12.1.3.1 nvidia-cuda-cupti-cu12-12.1.105 nvidia-cuda-nvrtc-cu12-12.1.105 nvidia-cuda-runtime-cu12-12.1.105 nvidia-cudnn-cu12-9.1.0.70 nvidia-cufft-cu12-11.0.2.54 nvidia-curand-cu12-10.3.2.106 nvidia-cusolver-cu12-11.4.5.107 nvidia-cusparse-cu12-12.1.0.106 nvidia-nccl-cu12-2.20.5 nvidia-nvjitlink-cu12-12.6.20 nvidia-nvtx-cu12-12.1.105 opencv-python-4.10.0.84 oss2-2.18.4 packaging-24.1 pandas-2.0.3 pathlib-1.0.1 pillow-10.4.0 pkginfo-1.11.1 pycparser-2.21 pycryptodome-3.20.0 pydot-3.0.1 pyloco-0.0.139 pyparsing-3.1.2 python-dateutil-2.9.0.post0 python-docx-1.1.0 pytz-2024.1 pyxnat-1.6.2 rdflib-6.3.2 readme-renderer-43.0 requests-2.32.3 requests-toolbelt-1.0.0 rfc3986-2.0.0 rich-13.7.1 scipy-1.10.1 simplejson-3.19.3 six-1.16.0 sympy-1.13.2 torch-2.4.0 traits-6.3.2 triton-3.0.0 twine-5.1.1 typing-3.7.4.3 typing-extensions-4.9.0 tzdata-2024.1 urllib3-2.2.2 ushlex-0.99.1 websocket-client-1.8.0 zipp-3.20.0

 

将多个图片合并为 PDF

2024年8月20日 15:23

某个业务需要让用户下载文件盖章之后重新上传盖章版本,但是现在有个问题那就是操作基本都在手机端,通过手机端上传 pdf 的确是个问题。所以目前的方案是上传盖章版之后的图片。

然鹅,这个方法用户表示略微有点蛋疼,有的需要上传几十张图片,这些盖章的图片重新下载之后管理也是个问题。那个是哪个根本分不清楚,并且要想根据业务编号来管理盖章版文件也是个问题。

所以,就给出了一个方案,将上传的 图片重新转换为 pdf。

鉴于图片是放在 oss 上的,oss 本身倒是提供了图片转 pdf 的方法(https://help.aliyun.com/zh/imm/user-guide/convert-an-image-to-pdf):

# -*- coding: utf-8 -*-
# This file is auto-generated, don't edit it. Thanks.
import sys
import os
from typing import List

from alibabacloud_imm20200930.client import Client as imm20200930Client
from alibabacloud_tea_openapi import models as open_api_models
from alibabacloud_imm20200930 import models as imm_20200930_models
from alibabacloud_tea_util import models as util_models
from alibabacloud_tea_util.client import Client as UtilClient


class Sample:
    def __init__(self):
        pass

    @staticmethod
    def create_client(
        access_key_id: str,
        access_key_secret: str,
    ) -> imm20200930Client:
        """
        使用AccessKey ID&AccessKey Secret初始化账号Client。
        @param access_key_id:
        @param access_key_secret:
        @return: Client
        @throws Exception
        """
        config = open_api_models.Config(
            access_key_id=access_key_id,
            access_key_secret=access_key_secret
        )
        # 填写访问的IMM域名。
        config.endpoint = f'imm.cn-zhangjiakou.aliyuncs.com'
        return imm20200930Client(config)

    @staticmethod
    def main(
        args: List[str],
    ) -> None:
        # 阿里云账号AccessKey拥有所有API的访问权限,建议您使用RAM用户进行API访问或日常运维。
        # 强烈建议不要把AccessKey ID和AccessKey Secret保存到工程代码里,否则可能导致AccessKey泄露,威胁您账号下所有资源的安全。
        # 本示例通过从环境变量中读取AccessKey,来实现API访问的身份验证。如何配置环境变量,请参见https://help.aliyun.com/document_detail/2361894.html。
        imm_access_key_id = os.getenv("AccessKeyId")
        imm_access_key_secret = os.getenv("AccessKeySecret")
        client = Sample.create_client(imm_access_key_id, imm_access_key_secret)
        sources_0 = imm_20200930_models.CreateImageToPDFTaskRequestSources(
            uri='oss://test-bucket/test-object.jpg'
        )
        create_image_to_pdftask_request = imm_20200930_models.CreateImageToPDFTaskRequest(
            project_name='test-project',
            target_uri='oss://test-bucket/test-target-object.pdf',
            sources=[
                sources_0
            ]
        )
        runtime = util_models.RuntimeOptions()
        try:
            # 复制代码运行请自行打印API的返回值。
            client.create_image_to_pdftask_with_options(create_image_to_pdftask_request, runtime)
        except Exception as error:
            # 如有需要,请打印错误信息。
            UtilClient.assert_as_string(error.message)

    @staticmethod
    async def main_async(
        args: List[str],
    ) -> None:
        # 阿里云账号AccessKey拥有所有API的访问权限,建议您使用RAM用户进行API访问或日常运维。
        # 强烈建议不要把AccessKey ID和AccessKey Secret保存到工程代码里,否则可能导致AccessKey泄露,威胁您账号下所有资源的安全。
        # 本示例通过从环境变量中读取AccessKey,来实现API访问的身份验证。如何配置环境变量,请参见https://help.aliyun.com/document_detail/2361894.html。
        imm_access_key_id = os.getenv("AccessKeyId")
        imm_access_key_secret = os.getenv("AccessKeySecret")
        client = Sample.create_client(imm_access_key_id, imm_access_key_secret)
        sources_0 = imm_20200930_models.CreateImageToPDFTaskRequestSources(
            uri='oss://test-bucket/test-object.jpg'
        )
        create_image_to_pdftask_request = imm_20200930_models.CreateImageToPDFTaskRequest(
            project_name='test-project',
            target_uri='oss://test-bucket/test-target-object.pdf',
            sources=[
                sources_0
            ]
        )
        runtime = util_models.RuntimeOptions()
        try:
            # 复制代码运行请自行打印API的返回值。
            await client.create_image_to_pdftask_with_options_async(create_image_to_pdftask_request, runtime)
        except Exception as error:
            # 如有需要,请打印错误信息。
            UtilClient.assert_as_string(error.message)


if __name__ == '__main__':
    Sample.main(sys.argv[1:])

然而,项目里面已经引入了比较旧的 aliyun 的 sdk。这个新的再引用之后就需要修改之前的代码,这也就蛋疼了。

网上搜了一下,代码不少,但是不好用啊,这尼玛,就没人写个靠谱的代码吗?

最终通过PyMuPDF来解决了这个问题:

import fitz  # PyMuPDF

# Open an existing PDF or create a new one
pdf_document = fitz.open()  # Creates a new PDF

# Define the image file path
image_path = "path/to/your/image.jpg"

# Get the dimensions of the image
img = fitz.open(image_path)
img_rect = img[0].rect  # Get the rectangle of the first page of the image

# Create a new page with the same dimensions as the image
pdf_page = pdf_document.new_page(width=img_rect.width, height=img_rect.height)

# Insert the image into the new page
pdf_page.insert_image(pdf_page.rect, filename=image_path)

# Save the PDF to a file
pdf_document.save("output.pdf")
pdf_document.close()

实际的业务代码:

def converImageToPdf(img_list):
    # pdf = fitz.open() # PyMuPDF
    pdf_document = fitz.open()  # Creates a new PDF

    for img_url in img_list:
        img_local_file = download_image(img_url,'confirmd_images')
        img = fitz.open(img_local_file)
        img_rect = img[0].rect  # Get the rectangle of the first page of the image

        # Create a new page with the same dimensions as the image
        pdf_page = pdf_document.new_page(width=img_rect.width, height=img_rect.height)

        # Insert the image into the new page
        pdf_page.insert_image(pdf_page.rect, filename=img_local_file)
        img.close()
    file_name = random_file_name('pdf')
    if not os.path.exists('confirmd_receipt'):
        os.mkdir('confirmd_receipt')
    pdf_document.save(os.path.join('confirmd_receipt/') + file_name)
    pdf_document.close()

实际效果:

依赖:

PyMuPDFb      ==      1.24.9

 

利用PDF.js在微信小程序里预览PDF文件

2023年8月21日 18:15

在微信小程序可以通过wx.downloadFilewx.openDocument 两个api下载并打开pdf文件。这种方式主要有不少的缺点:

1、需要下载才可以查看,且每次打开都需要下载生成一个临时文件,如果PDF文件比较多的话,临时文件会越来越多,且如果PDF文件比较大的话,打开会比较慢。
2、在导航栏显示标题是临时文件名,看上去不够优雅。
3、翻页不方便。

那PDF能不能在小程序直接预览呢?我尝试用微信小程序的web-view里显示PDF的文件,在开发工具里可以显示,但在真机里无法显示。在微信开放社区看有人用PDF.js在浏览器里打开PDF文件,PDF.js 由 Mozilla 提供支持,目标是创建一个通用的、基于 Web 标准的平台,用于解析和呈现 PDF. 通过web-view方式打开通过PDF.js解析的PDF文件,在微信开发工具里无法正常显示,不过好消息是:在真机里可以显示正常。

使用PDF.js来解析PDF方法如下:

1、去PDF.js官方网站下载此框架:https://mozilla.github.io/pdf.js/getting_started

2、把PDF.js部署到网站,PDF.js有两个文件夹web和build,把这两个文件放到网站的一个目录下比如pdfljs目录,在web目录下有个viewer.html文件,可以用它来在线解析pdf文件,当然pdf文件的链接需要在同一个域名,预览的方式是:

https://wwww.domianname.com/pdfjs/web/viewer.html?file=xxx/xxx/xxx.pdf

微慕专业版已集成了PDF.js框架,支持通过pdf的链接在浏览器和小程序里预览PDF文件,在微慕专业里体验该功能的效果。

预览pdf文件:https://blog.minapper.com/wp-content/uploads/微慕小程序专业版.pdf

注意以上方式PDF文件的链接所在域名需要设置的小程序业务域名里。对于跨域的链接,虽然也支持,不过需要特别处理,具体详见链接:https://github.com/mozilla/pdf.js/wiki/Frequently-Asked-Questions#faq-xhr

下载打开pdf文件:https://www.watch-life.net/微慕小程序开源版.pdf

以上下载打开PDF文件的方式需要设置业务域名和downloadfile域名。

利用PDF.js在微信小程序里预览PDF文件,支持PDF.js的相关功能,比如:侧栏,查找,分页,缩放,添加文字,绘图,旋转,演示模式等。

上面是通过官方viewer.html来显示PDF文件,也可以通过引入PDF.js的方式来解析和显示,这个方式就可以自定义功能。方法如下:

1、引入pdf.js库

<script src="./build/pdf.js"></script>
<script src="./build//pdf.worker.js"></script>

2、用canvas接收需要读取到的pdf内容并显示

<canvas id="myCanvas"></canvas>

3、创建PDF对象:data可以是pdf文件对应的Base64字符串,也可以是文件所在相对或者绝对路径,也可以是一个在线文件url地址

var loadingTask = pdfjsLib.getDocument(data)
loadingTask.promise.then(function (pdf) {
                for (var i = 1; i 

有关PDF.js的更多信息,可以参考官方网站:https://mozilla.github.io/pdf.js/

The post 利用PDF.js在微信小程序里预览PDF文件 first appeared on 守望轩.
❌
❌