- A+
这里给大家分享我在网上总结出来的一些知识,希望对大家有所帮助
最近受够了公司内部站点每次登陆都需要填写用户名和密码,还有输入验证码。
要是能够直接跳过登陆页面就好啦。
说干就干,决定使用油猴插件实现自动登陆功能。
其中最难解决的就是验证码破解,花了一天的时间完美解决,现在整理出来。
1.分析验证码
分析验证码,是破解验证码一切工作的开始。
- 验证码有哪些特征?
- 是否容易破解?
- 采用什么策略破解?
特征总结
这里仅是总结一下公司网站验证码(上面验证码图片)的特征。
- 仅有字母(大小写)和数字,并且剔除了难以区分的字符:
1
、i
、I
、l
、L
、0
、o
、O
。 - 同一字符每次出现的大小、粗细、倾斜都一致(容易做成标准的字符样本库)
- 首字符开始的位置一致(方便裁剪左侧背景)
- 有干扰线和背景色,颜色相较于字符都比较亮(方便通过阈值来区分像素是否属于字符)
制定破解策略
根据上一步分析的验证码特征来制定破解该验证码的策略。
- 制作标准样本库
- 使用标准样本对验证码图片进行卷积比对(下面会有介绍)
2.制作样本库
- 请求获取验证码
- 提取图片像素
- 二值化(将像素处理成0和1)
- 用canvas绘制二值化后的验证码(白底黑字,也可等比放大以便查看和截图)
- 从绘制的二值化后的验证码上截取合适的字符
- 处理字符截图(去白边,去噪点)
- 还原图片的放大比例(若之前有放大处理)
- 保存为模板字符串
获取验证码
// 返回图片base64数据 function getVerifyCode() { return fetch(VERIFY_CODE_API) .then(rsp => rsp.json()) .then(data => `data:image/png;base64,${data.data}`) }
将base64数据转成像素
使用canvas。
// 支持base64数据或本地图片路径 async function getImageData(imageSrc) { const image = new Image(); image.src = imageSrc; // 等待图片加载完成 await new Promise(resolve => { image.onload = resolve; }); // 创建canvas const canvas = document.createElement('canvas'); const context = canvas.getContext('2d'); context.drawImage(image, 0, 0); return context.getImageData(0, 0, image.width, image.height); }
返回ImageData
类型的对象。
data
是一个Uint8ClampedArray
,一个类型数组,每4位表示一个像素的rgba值(0-255)。
二值化处理
首先需设置好一个阈值,亮度高于阈值认定为背景,低于阈值暂认定为字符(有可能是噪点或干扰线)。
阈值需要根据实际效果进行调优(不断修改)。
推荐初始阈值可以设置为[130, 130, 130]
(rgb通道值,alpha固定是255就不设置了),约是0-255的中间数。
const threshold = [130, 130, 130]; // 返回每一项都是0或1的二维数组 function binarization(imageData) { const pixel2binary = pixel => pixel.every((chValue, index) => chValue > threshold[index]) ? '0' : '1'; // data中每4位表示一个像素 const { data, width, height } = imageData; const binaryData = []; let x, y, row, rowLoc, pixel, pixelLoc; for (y = 0; y < height; y++) { row = []; // 当前行起始位置 rowLoc = y * width * 4; for (x = 0; x < width; x++) { pixelLoc = rowLoc + x * 4; // 取该点的rgb色值 pixel = imageData.slice(pixelLoc, 3); row.push(pixel2binary(pixel)); } binaryData.push(row); } return binaryData; }
绘制二值化的数据(黑字白底)
function drawBinaryData(context, data, scale = 1) { const binary2pixel = binary => binary === '0' ? [255, 255, 255, 255] : [0, 0, 0, 255]; const repeatAction = (action) => { for (let i = 0; i < scale; i++) action(); }; const h = data.length; const w = data[0].length; let x, y, row; cosnt pixelData = []; for (y = 0; y < h; y++) repeatAction(() => { for (x = 0; x < w; x++) repeatAction(() => { pixelData.push(...binary2pixel(data[y][x])); }); }); // 创建ImageData实例 const imageData = new ImageData( Uint8ClampedArray.from(pixelData), w * scale, h * scale ); return context.putImageData(imageData, 0, 0); }
输出宽高都放大4倍的验证码:
截图保存样本
挑选合适的验证码将字符截图出来。
上面验证码中的字符5就不适合作为样本,因为截取后右下方会有其它字符的点。当然也可以使用工具或写代码去除.
将所有字符样本都保存下来。这需要不断请求获取验证码图片。
去掉字符截图白边
function cutWhiteEdge(data) { let edge; const isWhiteEdge = () => edge.every(binary => binary === '0'); // 连续切边 const cutEdgeContinuous = (resetEdge, cutEdge) => { const _resetEdge = () => (edge = resetEdge()); for (_resetEdge(); isWhiteEdge(); cutEdge(), _resetEdge()); }; // 切边顺序:上下左右 // 上 cutEdgeContinuous( () => data[0], () => data.shift() ); // 下 cutEdgeContinuous( () => data[data.length - 1], () => data.pop() ); // 左 cutEdgeContinuous( () => data.map(r => r[0]), () => data.forEach(r => r.shift()) ); // 右 cutEdgeContinuous( () => data.map(r => r[r.length - 1]), () => data.forEach(r => r.pop()) ); }
还原二值化数据的缩放
function restoreDataScale(data, scale) { const scaleData = []; let x, y, row; const h = data.length; const w = data[0].length; for (y = 0; y < h; y += scale) { row = []; for (x = 0; x < w; x += scale) { row.push(data[y][x]); } scaleData.push(row); } return scaleData; }
保存模板字符串
就是将处理后的二值化数组,转为字符串形式,方便保存(数据库等)。
function binaryData2Template(data) { return data.map(r => r.join('')).join(' '); }
右侧控制台打印出的就是模板字符串,不过是使用换行符进行每行的分隔。
读取字符截图
上面刚刚介绍了字符截图和处理截图,当中少了读取字符截图这一步。
可以写代码直接读取字符截图的文件夹,一次性处理所有字符截图。
我在做这一步时,是使用input[type=file]
手动每次选择一张字符截图进行处理的(时间紧张),这里贴一下代码。
fileInput.addEventListener('change', e => { // 获取文件 if (fileInput.files.length === 0) return; const file = fileInput.files[0]; const reader = new FileReader(); reader.addEventListener('load', async e => { // e.target.result是图片的base64资源 const imageData = getImageData(e.target.result); const binaryData = binarization(imageData); cutWhiteEdge(binaryData); // 还原之前对图片的放大 const restoreData = restoreDataScale(binaryData, 4); const template = binaryData2Template(restoreData); // 使用clipboard将模板写入剪切板 navigator.clipboard.writeText(template); // 也可以发接口写入数据库... }); reader.readAsDataURL(file); });
FileReader的load事件
二值化阈值调整
经过多次获取验证码、二值化、然后输出查看发现,有些验证码的图片二值化后有的字符被去除了或去除了部分,原因是这些字符的颜色也比较亮。
比如这一张验证码,打印出来是这样的(字符S亮度较高):
此时需要调整阈值(调高一点):
const threshold = [140, 140, 140];
3.卷积比对
上面介绍了如何获取字符模板。在进行卷积比对前,需要处理和保存好所有字符的模板(这是一个辛苦活?)。
获取模板
我这里直接使用常量定义了所有字符模板。
const CODE_TEMPLATES = { 2: '0000001111100 0000111111110 0001110000111 0001100000011 0011100000011 0000000000011 0000000000110 0000000001110 0000000001100 0000000011000 0000000110000 0000011100000 0000111000000 0001110000000 0011100000000 0111000000000 0111111111110 1111111111110', 3: '000001111000 000111111110 001110000110 001100000011 011100000011 000000000011 000000000110 000000001110 000011111000 000011111000 000000001100 000000001110 000000000110 110000000110 110000001100 111000011100 011111111000 001111100000', 4: '0000000000111 0000000001110 0000000011110 0000000111110 0000000110110 0000001101110 0000011001100 0000110001100 0001110001100 0001100001100 0011000001100 0110000011100 1111111111111 1111111111111 0000000011000 0000000011000 0000000111000 0000000111000', 5: '000111111111 000111111111 001100000000 001100000000 001100000000 001100000000 011011110000 011111111000 011100011100 000000001100 000000001110 000000001110 000000001100 110000001100 110000011100 111000111000 011111110000 001111100000', 6: '0000001111 0000111111 0001111000 0011100000 0011000000 0110000000 0110111100 1111111110 1111000111 1110000011 1100000011 1100000011 1100000011 1100000011 1100000111 1110001110 0111111100 0011111000', 7: '111111111111 111111111111 000000000110 000000000110 000000001100 000000011100 000000011000 000000110000 000000110000 000001100000 000011100000 000011000000 000111000000 000110000000 001100000000 011100000000 011000000000 111000000000', 8: '000001111100 000011111110 000111000111 001110000011 001100000011 001100000011 001100000111 001110001110 000111111100 000111111100 011100001100 011000000110 110000000110 110000000110 110000001110 111000011100 011111111000 000111110000', 9: '00001111000 00111111100 01110001110 01100000111 11100000011 11000000011 11000000011 11000000011 11100000111 01100001110 01111111110 00111100110 00000001100 00000001100 00000011000 00001110000 01111100000 01110000000', a: '00001111100 00111111110 01110000110 01100000111 00000000111 00011111110 01111111110 11100000110 11000000110 11000001110 11000011110 11111111100 01111101110', A: '000000000111000 000000000111000 000000001111000 000000001111000 000000011001100 000000011001100 000000110001100 000000110001100 000001100001100 000001100001110 000011000000110 000011111111110 000111111111110 001110000000110 001100000000111 011100000000011 011000000000011 111000000000011', b: '000110000000 000110000000 001110000000 001100000000 001100000000 001100000000 001101111000 011111111110 011110001110 011100000110 011000000111 011000000111 011000000111 111000000110 111000000110 111000001110 111000011100 111111111000 110111110000', B: '0001111111100 0011111111110 0011100000111 0011000000011 0011000000011 0011000000011 0011000000111 0111000001110 0111111111100 0111111111100 0110000001110 0110000000110 0110000000110 1110000000110 1100000001110 1100000011100 1111111111000 1111111110000', c: '00001111100 00011111110 00111000111 01100000011 01100000011 11100000000 11000000000 11000000000 11000000000 11100000111 01100001110 01111111100 00011110000', C: '000000111110000 000011111111100 000111100001110 000110000000110 001100000000110 001100000000111 011100000000000 011000000000000 011000000000000 011000000000000 011000000000000 111000000000000 011000000001100 011000000001100 011000000011000 001100000111000 001111111110000 000011111000000', d: '0000000000011 0000000000011 0000000000111 0000000000110 0000000000110 0000000000110 0000111100110 0011111111110 0011100011110 0110000001100 0110000001100 1110000001100 1100000001100 1100000001100 1100000011100 1110000011000 0110000111000 0111111111000 0011111011000', D: '00011111110000 00011111111100 00111000011110 00110000000110 00110000000111 00110000000011 00110000000011 00110000000011 01110000000011 01100000000111 01100000000110 01100000000110 01100000001110 11100000001100 11100000011100 11000001111000 11111111110000 11111111000000', e: '00001111100 00011111110 00110000111 01100000011 01100000011 11111111111 11111111111 11000000000 11000000000 11100000000 01110000110 01111111100 00011111000', E: '00011111111111 00011111111110 00111000000000 00111000000000 00110000000000 00110000000000 00110000000000 00110000000000 01111111111000 01111111111000 01100000000000 01100000000000 01100000000000 11100000000000 11100000000000 11000000000000 11111111111000 11111111111000', f: '000001111 000111110 000111000 001110000 001100000 001100000 111111100 111111100 001100000 011100000 011000000 011000000 011000000 011000000 011000000 111000000 110000000 110000000 110000000', F: '00011111111111 00011111111110 00111000000000 00111000000000 00110000000000 00110000000000 00110000000000 00110000000000 01111111111000 01111111111000 01100000000000 01100000000000 01100000000000 11100000000000 11100000000000 11000000000000 11000000000000 11000000000000', g: '0000011110011 0001111111111 0001110001111 0011100000111 0011000000110 0111000000110 0110000000110 0110000000110 0110000001110 0111000001100 0011000011100 0011111111100 0001111101100 0000000011100 0100000011000 1110000111000 0111111110000 0011111000000', G: '00000111111000 00001111111100 00011100001110 00110000000110 01110000000111 01100000000000 11100000000000 11000000000000 11000000000000 11000001111110 11000001111110 11000000000110 11000000001110 11000000001100 11100000001100 01110000011100 00111111111000 00011111100000', h: '000111000000 000110000000 000110000000 000110000000 000110000000 001110000000 001110111100 001101111110 001111000111 001100000111 001100000011 011100000111 011100000110 011000000110 011000000110 011000000110 011000001110 111000001110 110000001100', H: '0001100000000011 0001100000000011 0011100000000111 0011100000000110 0011000000000110 0011000000000110 0011000000000110 0011000000000110 0111111111111110 0111111111111100 0110000000001100 0110000000001100 0110000000001100 1110000000011100 1110000000011100 1100000000011000 1100000000011000 1100000000011000', j: '000000110 000000111 000000110 000000000 000000000 000000110 000001110 000001110 000001100 000001100 000001100 000001100 000011100 000011000 000011000 000011000 000011000 000011000 000111000 000110000 000110000 111110000 111100000', J: '0000000000011 0000000000011 0000000000011 0000000000011 0000000000111 0000000000110 0000000000110 0000000000110 0000000000110 0000000001110 0000000001110 0000000001100 0000000001100 1110000001100 1110000011100 0111000111000 0111111110000 0001111100000', k: '0000110000000 0001110000000 0001100000000 0001100000000 0001100000000 0001100000000 0001100001111 0011100011100 0011000111000 0011001110000 0011011100000 0011111000000 0011111000000 0111111100000 0110001100000 0110000110000 0110000111000 0110000011000 1110000011100', K: '0001100000001111 0001100000011100 0011100000111000 0011100001110000 0011000011100000 0011000111000000 0011001110000000 0011011100000000 0111111100000000 0111111100000000 0111101110000000 0111000110000000 0110000111000000 1110000011000000 1110000011100000 1100000001100000 1100000001110000 1100000000111000', m: '00111011110000111100 00111111111011111110 00111000011110000110 00110000011100000111 00110000001100000111 01110000011100000110 01110000011000000110 01100000011000000110 01100000011000000110 01100000011000000110 01100000111000001110 11100000111000001100 11000000110000001100', M: '00011100000000000111 00011100000000001111 00111100000000001111 00111100000000011110 00110110000000111110 00110110000000110110 00110110000001110110 00110110000001100110 01110111000011101110 01100011000011001100 01100011000110001100 01100011000110001100 01100011001100001100 11100011111100011100 11100001111000011000 11000001111000011000 11000001110000011000 11000001110000011000', n: '00110111110 00111111111 01111000111 01110000011 01100000011 01100000011 01100000011 01100000111 11100000110 11000000110 11000000110 11000000110 11000001110', N: '00011100000000111 00011100000000111 00011110000000110 00011110000000110 00011111000000110 00011011000000110 00111011100001110 00111001100001110 00110001110001100 00110000110001100 00110000111001100 00110000011001100 01110000011011100 01110000011111000 01100000001111000 01100000001111000 01100000000111000 11100000000111000', p: '0001101111000 0001111111110 0011110001110 0011100000110 0011000000111 0011000000111 0011000000110 0111000000110 0110000000110 0110000001110 0111000011100 0111111111000 0110111110000 1110000000000 1100000000000 1100000000000 1100000000000 1100000000000', P: '000111111111000 000111111111110 000110000000110 000110000000111 000110000000011 000110000000011 001110000000111 001110000000111 001100000001110 001111111111100 001111111110000 001100000000000 011100000000000 011100000000000 011000000000000 011000000000000 011000000000000 111000000000000', q: '000011110011 001111111111 001110001111 011100000110 011000000110 111000000110 110000000110 110000001110 110000001110 111000001100 011000011100 011111111100 001111101100 000000001100 000000011000 000000011000 000000011000 000000011000', Q: '00000111110000 00011111111100 00111100001110 00110000000110 01100000000110 01100000000111 11100000000111 11000000000111 11000000000111 11000000000111 11000000000110 11000000000110 11000000001110 11000000001100 11100000011100 01110000111000 01111111110000 00011111110000 00000000111000 00000000011100 00000000010000', r: '001110111 001111111 001110000 001100000 001100000 001100000 011100000 011000000 011000000 011000000 011000000 111000000 111000000', R: '00011111111000 00011111111100 00111000001110 00110000000110 00110000000111 00110000000111 00110000000110 01110000001110 01110000011100 01111111111000 01111111110000 01100000110000 01100000110000 11100000111000 11100000011000 11000000011000 11000000011100 11000000001100', s: '00001111100 00111111110 01110000111 01100000011 01110000000 00111110000 00011111100 00000011110 00000000110 11000000110 11100001110 01111111100 00111110000', S: '00000111111000 00001111111100 00011100001110 00111000000110 00110000000111 00110000000000 00110000000000 00011100000000 00001111000000 00000111110000 00000000111000 00000000001100 00000000001100 11000000001100 11000000011100 01110000111000 01111111111000 00011111100000', t: '0001100 0001100 0001100 1111111 1111111 0011000 0011000 0011000 0011000 0111000 0110000 0110000 0110000 0110000 0111100 0011100', T: '11111111111111 11111111111110 00000111000000 00000111000000 00000110000000 00000110000000 00000110000000 00000110000000 00001110000000 00001110000000 00001100000000 00001100000000 00001100000000 00001100000000 00011100000000 00011100000000 00011000000000 00011000000000', u: '011100000111 011000000110 011000000110 011000000110 011000000110 011000001110 111000001100 110000001100 110000001100 110000011100 111000111100 011111111100 001111011000', U: '000110000000011 001110000000011 001100000000111 001100000000110 001100000000110 001100000000110 011100000000110 011100000000110 011000000001110 011000000001100 011000000001100 011000000001100 011000000001100 111000000011100 011000000011000 011100001111000 001111111110000 000111111000000', v: '11100000011 01100000111 01100000110 01100001110 01100001100 00100011100 00110011000 00110110000 00110110000 00111100000 00011100000 00011000000 00011000000', V: '111000000000111 011000000000110 011000000001110 011000000001100 011000000011100 011100000011000 001100000111000 001100000110000 001100001110000 001100001100000 001100011100000 000110011000000 000110111000000 000110110000000 000111110000000 000111100000000 000011100000000 000011000000000', w: '111000001100000111 011000011100000110 011000011100001100 011000111100001100 011000110100011000 011001100100011000 011001100110111000 011011000110110000 011011000110110000 011110000111100000 001110000111100000 001100000011000000 001100000011000000', W: '111000000111000000111 111000000111000000110 011000001111000001110 011000001111000001100 011000001111000001100 011000011011000011100 011000011011000011000 011000110011000011000 011000110001000110000 011001110001100110000 011001100001101110000 011001100001101100000 011011000001101100000 011011000001111000000 011110000001111000000 001110000001111000000 001110000001110000000 001100000000110000000', x: '0001100000111 0001110000110 0000110001100 0000111011100 0000011111000 0000011110000 0000001100000 0000011110000 0000110110000 0001110111000 0011100011000 0111000011100 1110000001100', X: '00011100000000111 00001110000001110 00000110000011100 00000111000011000 00000011000111000 00000011101110000 00000011111100000 00000001111000000 00000001111000000 00000001110000000 00000011111000000 00000111011000000 00000110011100000 00001110001100000 00011100001110000 00111000000110000 00110000000111000 11110000000011000', y: '0001100000011 0001100000111 0001100000110 0001100001110 0001110001100 0000110011100 0000110011000 0000110111000 0000110110000 0000111100000 0000111100000 0000011000000 0000011000000 0000110000000 0000110000000 0001100000000 1111100000000 1110000000000', Y: '11100000000111 01100000001110 01100000001100 01110000011100 00110000111000 00110000110000 00111001110000 00011011100000 00011011000000 00011111000000 00001110000000 00001100000000 00001100000000 00001100000000 00011100000000 00011100000000 00011000000000 00011000000000', z: '001111111111 001111111111 000000001110 000000011100 000000111000 000001110000 000011100000 000111000000 000110000000 001100000000 011000000000 111111111100 111111111100', Z: '000111111111111 000111111111111 000000000001110 000000000001100 000000000011100 000000000011000 000000001110000 000000011100000 000000111000000 000000110000000 000001100000000 000011100000000 000111000000000 001110000000000 011100000000000 011000000000000 111111111111100 111111111111000', };
统计字符模板中有效像素
统计字符模板中有效像素,是指统计模板中出现1
的个数(0
表示背景,无效像素)。
统计有效像素的目的是为了后面判断相似度时使用。
这一步也可以在得到模板的时候就做好,然后保存到数据库。
const tplEffectPoints = CODE_TEMPLATES.reduce((calc, code) => { // 统计每个字符模板中1的个数 calc= CODE_TEMPLATES
.split('').filter(c => c === '1').length; return calc; }, {});
什么是卷积比对
我制作了一个gif示意图。卷积比对,我之前称之为扫描比对,就相当于拿着模板在图片上不停的移动(从左往右,从上往下),判断图片上的有效像素点(为1的点)是否与该字符模板的有效像素点重合度(也是相似度)。
可以想一下,为什么只判断有效像素点的重合度,而不判断非有效像素。
实现卷积比对代码
// 返回是否匹配,匹配个数,匹配位置 function convolution(binaryData, threshold = 1) { const codes = Object.keys(CODE_TEMPLATES); const h = binaryData.length; const w = binaryData[0].length; const matches = []; let code, tplData, tplH, tplW; function doConvolution() { let x, y, colLastIdx, rowLastIdx; // 返回1的个数,重合个数,重合百分比(相似度) const compare = (x, y, code) => { let effectivePointNum = 0; for (let i = 0; i < tplH; i++) { for (let j = 0; j < tplW; j++) { if (tplData[i][j] === '1') { if (tplData[i][j] === binaryData[i + y][j + x]) { effectivePointNum++; } } } } // 相似度 = 重合点数/字符模板有效点数 const similarity = effectivePointNum / tplEffectPoints; return { x, y, similarity }; }; // 卷积方向:从左往右,从上往下 for (y = 0, rowLastIdx = h - tplH; y <= rowLastIdx; y++) { for (x = 0, colLastIdx = w - tplW; x <= colLastIdx; x++) { const result = compare(x, y, code); if (result.similarity >= threshold) { matches.push({ ...result, code }); } } } } for (let i = 0; i < codes.length; i++) { code = codes[i]; // 将模板转成二维数组 tplData = CODE_TEMPLATES
.split(' ').map(row => row.split('')); tplH = tplData.length; tplW = tplData[0].length; doConvolution(); } // 按位置(x轴)排序 matches.sort((a, b) => a.x - b.x); return matches; }
其它处理
在进行卷积比对前,需将验证码进行二值化处理。
二值化后的图片可能还需要进行其它处理,如去噪点、去干扰线等。
这里简单处理了一下噪点。
去噪点
噪点就是在验证码图片上随机放上一些亮度较暗的一些点,如果我们仅通过明暗这个阈值来做过滤时,很容易将噪点当做有效像素。
噪点的特征
一般来说,噪点都是随机的,不连续的.
这里简单判断一下噪点:如果一个有效点(为1的点)的周围(上下左右)不存在另一个有效点,那么就认为这个有效点是一个噪点。
function denoising(binData) { const h = binData.length; const w = binData[0].length; const isEffectivePoint = (x, y) => binData[y][x] === '1'; const checkAround = (x, y) => { // 边界控制 const checkTop = y > 0; const checkBottom = y < h - 1; const checkLeft = x > 0; const checkRight = x < w - 1; return ( (checkTop && isEffectivePoint(x, y - 1)) || (checkBottom && isEffectivePoint(x, y + 1)) || (checkLeft && isEffectivePoint(x - 1, y)) || (checkRight && isEffectivePoint(x + 1, y)) ); }; for (let y = 0; y < h; y++) { for (let x = 0; x < w; x++) { if (isEffectivePoint(x, y) && !checkAround(x, y)) { // 将噪点置为无效点 binData[y][x] = '0'; } } } }
后期处理
通过以上卷积比对拿到的结果可能并不总是满足我们的目的。
识别上面的验证码图片,得到的匹配结果是这样的:
识别结果中数量不仅超出了4个,还额外多识别了r
。这是因为该字体的字符P
中包含了字符r
所有的有效像素。
所以,在匹配结果中,P
字符位置若识别出字符r
,我们应该舍弃字符r
。
这里列出该字体,所有有包含关系的字符:
const containMap = { Q: { C: -1 }, // C的x比Q小1 E: { F: 0 }, V: { v: 1 }, y: { v: 2 }, m: { r: 0 }, p: { r: 0 }, };
根据字符包含关系进行后期处理:
function afterEffect(matches) { if (matches.length <= 4) return; // 构建数据结构,方便后续处理 {e: [match], r: [match, match], ...} const codeMap = matches.reduce((map, item) => { const { code } = item; (map= map
|| []).push(item); return map; }, {}); Object.keys(containMap).forEach(code => { if (!codeMap
) return; Object.keys(containMap
).forEach(containCode => { if (!codeMap[containCode]) return; // 包含code与被包含code之间的位置偏差 const offest = containMap
[containCode]; codeMap
.forEach(Q => { let idx = codeMap[containCode].findIndex(C => C.x === Q.x + offest); if (idx > -1) { // 从codeMap中移除 const [C] = codeMap[containCode].splice(0, 1); // 从matches中移除 idx = matches.findIndex(item => item === C); matches.splice(idx, 1); } }); }); }); }
后期处理可以有很多步骤(这里仅做了一步),需根据具体情况进行处理,越简单越好。
最后从匹配结果中提取验证码。
const verifyCodes = matches.map(item => item.code).join('');
还原验证
在取值验证码之前,需要再核对一次matches中的个数,如果明显不符合,那说明我们处理的还有问题。可以将每一步处理结果进行保存,后期再拿出来还原,对出问题的步骤进行优化。
另外,在我们提交验证码校验后,如果没有校验通过,也需要保存所有步骤的处理结果以及验证码,需要后续排查和优化。
校验失败后处理
会存在校验失败的情况:一种情况是我们的处理还有问题、还有可能是验证码生成步骤也会不断调整。
当识别失败后,可以允许一定次数的重试。