分类 代码人生 下的文章

今天老高在更新**Table Of Contents for typecho**的时候,遇到了一个问题。

因为解析文档的时候,换行符\n被无故删除掉了,导致pre中code的解析一片混乱,阅读源代码后发现了这个选项stripRN,默认值是true,即默认删除变量中的换行符,将其关闭即可!

核心代码位于约1147行左右。

if ($stripRN) {
    $str = str_replace("\r", " ", $str);
    $str = str_replace("\n", " ", $str);
...

老早写的,都忘了当初想干啥了。。。。

半成品,留个坑,待填。

#!/usr/bin/env python
# encoding: utf-8

import cookielib
import requests


def http_send(url, post_data='', **kwargs):
    cookie_handler = cookielib.MozillaCookieJar('cookie.txt')
    try:
        cookie_handler.load(ignore_discard=1)
    except cookielib.LoadError, e:
        print e + "new cookie file"

    headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.2125.122 Safari/537.36'}
    if post_data:
        req = requests.post(url, data=post_data, cookies=cookie_handler)
    else:
        req = requests.get(url)

    print req.headers

    for c in req.cookies:
        cookie_handler.set_cookie(c)
        cookie_handler.save(ignore_discard=1)
    return req.content


if __name__ == '__main__':
    # init cookie
    print http_send('http://localhost/clientarea.php', {"A": 1})

#!/usr/bin/env python
# encoding: utf-8

import cookielib
import requests
from bs4 import BeautifulSoup
import logging
import logging.handlers
import os
import re


logging.basicConfig(filename=os.path.join(os.getcwd(), 'log.txt'), level=logging.DEBUG)

s = requests.session()


def main():
    do_login()
    scan_list()
    check()


def do_login():
    global headers, username, password
    # get token
    # token_html = s.get(login_url).content
    headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 '
                             '(KHTML, like Gecko) Chrome/38.0.2125.122 Safari/537.36'}

    token_html = open('C:\Users\Administrator\Desktop\index.html').read()
    token = find_token(token_html)
    post_data = {'token': token, 'username': username, 'password': password}
    s.post(affiliates_url, post_data)
    print s.content


def find_token(html):
    g = re.findall('name="token"\svalue="(\w+)"\s/>', html)
    if g:
        return g[0]
    else:
        log("Could not find token value!")
        raise Exception('Could not find token value')


def scan_list():
    print 111


def check():
    print 111


def log(msg):
    logging.debug(msg)


if __name__ == '__main__':
    # 初始化参数
    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.2125.122 Safari/537.36'}
    login_url = "https://bandwagonhost.com/clientarea.php"
    affiliates_url = "https://bandwagonhost.com/affiliates.php"
    username = 1111
    password = 2222
    main()

最近老高发现服务器的CPU总是被某个php-fpm占用过高,记录一下如何排查。

发现

如何发现的呢?当然是使用top命令,发现系统的load average>3,这说明系统已经处于比较高的负载中。

尝试解决

当我把php-fpm重启后,没过一会儿又开始cpu狂飙!这是什么鬼?

开始排查

首先,我们开启在php-fmp.conf中开启错误日志,慢执行日志还有常规日志

error_log = /var/log/php/error.log
access.log = /var/log/php/access.$pool.log
access.format = "%R - %u %t \"%m %r%Q%q\" %s %f %{mili}d %{kilo}M %{system}C%%"
slowlog = /var/log/php/slow.$pool.log
request_slowlog_timeout = 3s

重启php-fpm后我们开始监视日志

# 查看慢执行日志

[15-May-2015 12:50:22]  [pool www] pid 1819
script_filename = /home/ftp/phpergao/wwwroot/index.php
[0x00007f2d286c2790] replace() /home/ftp/phpergao/wwwroot/usr/plugins/CdnHelper/Plugin.php:72
[0x00007fff78ab00f0] replace() unknown:0
[0x00007f2d286c2420] call_user_func_array() /home/ftp/phpergao/wwwroot/var/Typecho/Plugin.php:489
[0x00007fff78ab0430] __call() unknown:0
[0x00007f2d286c1f78] contentEx() /home/ftp/phpergao/wwwroot/var/Widget/Abstract/Contents.php:141
[0x00007f2d286c1b78] ___content() /home/ftp/phpergao/wwwroot/var/Typecho/Widget.php:385
[0x00007fff78ab0850] __get() unknown:0
[0x00007f2d286c1870] content() /home/ftp/phpergao/wwwroot/var/Widget/Abstract/Contents.php:783
[0x00007f2d286c1628] content() /home/ftp/phpergao/wwwroot/var/Widget/Archive.php:1401
[0x00007f2d286c14d0] content() /home/ftp/phpergao/wwwroot/usr/themes/just/index.php:32
[0x00007f2d286c10f8] +++ dump failed

[15-May-2015 19:18:48]  [pool www] pid 5597
script_filename = /home/ftp/phpergao/wwwroot/index.php
[0x00007ff17fcf0168] __call() /home/ftp/phpergao/wwwroot/var/Typecho/Plugin.php:483
[0x00007fff915493c0] __call() unknown:0
[0x00007ff17fcefca8] ___title() /home/ftp/phpergao/wwwroot/var/Typecho/Widget.php:387
[0x00007fff915497e0] __get() unknown:0
[0x00007ff17fcef960] title() /home/ftp/phpergao/wwwroot/var/Widget/Abstract/Contents.php:809
[0x00007ff17fcef6d0] title() /home/ftp/phpergao/wwwroot/usr/themes/just/index.php:23
[0x00007ff17fcef2f8] +++ dump failed

其中contentEx引起了我的注意,这个方法是一个钩子,系统在获取到文章内容后执行,老高的有几个插件都挂载在此,突然就有想法了。

于是立即暂停有关的插件,过一阵负载变为load average: 0.39, 0.29, 0.42。

记录程序运行细节

记录程序运行时间


跟踪php的系统调用

老高使用strace查看php主进程以及fork出的子进程的系统调用,并输出到/tmp/output.txt

strace -o /tmp/output.txt -T -tt -F -e trace=all -p 31920

将输出的文件用scp拷贝到本地电脑上,经过分析,并发+插件几乎拖死了CPU。

结论

  1. 某些数据的展示与否最好把性能也考虑上
  2. 正则的效率不高,能不用尽量不用
  3. 主题中如果同一个变量要使用多次,请将其先保存至一个临时变量
  4. 缓存很重要
  5. strace是个好工具

这篇文章也可以学习一下 PHP高效率写法(详解原因)