生成300w二维码踩的坑
需求,生成300w code 不一样的二维码,并且要有一张数据库表来标识每一个code的内容
private static SocialGameAwardMilk milk;
private static int count = 300w;
private static int coreThread = 8;
static {
//
milk = new SocialGameAwardMilk();
milk.setExchanged("N");
milk.setAwardCode("no");
milk.setAwardName("什么都没有");
}
public void testCreateStatement() {
final MixedService mixedService = (MixedService) beanFactory.getBean("mixedService");
Runnable task = new Runnable() {
public void run() {
try {
System.out.println(Thread.currentThread().getName() + "start...");
long start = System.nanoTime();
int maxCnt = count / coreThread;
for (int i = 0; i < maxCnt; ++i) {
generateLoop(mixedService);
if (i % 1000 == 0) {
logger.error("qrcode index :{" + i + " }");
System.out.println(Thread.currentThread().getName() + " qrcode index :{" + i + " }");
}
}
long end = System.nanoTime();
System.out.println((end - start) / 1000000);
System.out.println(Thread.currentThread().getName() + "over...");
} catch (Exception e) {
logger.error("generateLoop error", e);
}
}
};
ExecutorService service = Executors.newFixedThreadPool(coreThread);
for (int i = 0; i < coreThread; ++i) {
service.execute(task);
}
}
private void generateLoop(MixedService mixedService) {
// create code ;
String code = ShortUrlUtil.getEncodeKey();
// new file ;
generateFile(code, milk.getAwardCode());
SocialGameAwardMilk newMilk = (SocialGameAwardMilk) milk.clone();
newMilk.setCode(code);
mixedService.insertAwardMilk(newMilk);
}
这段没啥好说的,就是要做好记录,log补全跟踪,不然出了问题都难定位到,然后生成文件和存数据库,特别是文件一定要生成正确之后才存数据库,不然,嘿嘿,所以说,就算多线程,生成文件和操作数据库一定要在一个loop里面
然后是校验程序,校验程序原本是list files然后一个个check qrcode有效性并且对比数据库.不过考虑到list files 本地文件句柄打开过慢.所以后期改为反向索引,从数据库反向对应文件,还好生成的时候按一定约定来的,俗话说 约定大于配置 ,还是有好处的
private static ConcurrentStack<SocialGameAwardMilk> pipeline = new ConcurrentStack<SocialGameAwardMilk>();
private static int createSize = 0;
private static int checkedSize = 0;
public void testCheckQrcodeFiles() throws IOException {
// 二维码路径
final MixedService mixedService = (MixedService) beanFactory.getBean("mixedService");
final Map<String, Object> alreadyMap = new HashMap<String, Object>();
alreadyMap.put("notexchanged", "N");
Long alreadyChecked = mixedService.countMilks(alreadyMap);
checkedSize = (int) (long) alreadyChecked;
createSize = checkedSize;
final Map<String, Object> map = new HashMap<String, Object>();
map.put("offset", 0);
map.put("length", 1000);
map.put("exchanged", "N");
Runnable createTask = new Runnable() {
@Override
public void run() {
if (createSize != checkedSize) {
return;
}
if (createSize % 10000 == 0) {
logger.error("create at 10000 times,this is index = " + checkedSize / 10000);
}
List<SocialGameAwardMilk> milks = mixedService.searchMilks(map);
if (milks != null) {
for (SocialGameAwardMilk m : milks) {
pipeline.push(m);
createSize++;
}
}
}
};
Runnable consumeTask = new Runnable() {
@Override
public void run() {
SocialGameAwardMilk m = pipeline.pop();
if (m == null) {
return;
}
if (checkedSize % 10000 == 0) {
logger.error("consume at 10000 times,this is index = " + checkedSize / 10000);
}
String sourceFile = prefixSrcFile + m.getAwardCode() + "/" + m.getCode() + ".png";
String descFile = prefixDescFile + m.getAwardCode() + "/" + (int) (checkedSize / 10000) + "/"
+ m.getCode() + ".png";
String descDir = prefixDescFile + m.getAwardCode() + "/" + (int) (checkedSize / 10000);
File fdSrcFile = new File(sourceFile);
if (!fdSrcFile.exists()) {
logger.error("source file not exist, place is " + sourceFile);
return;
}
if (!fdSrcFile.canRead()) {
fdSrcFile.setReadable(true);
}
File fdDescDir = new File(descDir);
if (!fdDescDir.exists()) {
if (!fdDescDir.mkdir()) {
logger.error("create dir error, place is " + descDir + "and this happended on " + sourceFile);
pipeline.push(m);
return;
}
}
File fdDescFile = new File(descFile);
String urlAndCode = prefixUrl + m.getCode();
if (checkDbAndQrcode(fdSrcFile, urlAndCode)) {
fdSrcFile.renameTo(fdDescFile);
m.setExchanged("C");
mixedService.updateMilk(m);
increaseCheckedSize();
} else {
m.setExchanged("E");
mixedService.updateMilk(m);
increaseCheckedSize();
// 这个是io异常/经过测算zxing qrcode库识别率是86%左右,识别不出会抛NotFundException,汗。这里先不记 ,以数据库为准
// logger.error("source file checked different, place is " + sourceFile);
return;
}
}
};
ScheduledExecutorService service = Executors.newScheduledThreadPool(4);
service.scheduleAtFixedRate(createTask, 0, 5, TimeUnit.MILLISECONDS);
service.scheduleAtFixedRate(consumeTask, 0, 1, TimeUnit.MILLISECONDS);
service.scheduleAtFixedRate(consumeTask, 0, 1, TimeUnit.MILLISECONDS);
service.scheduleAtFixedRate(consumeTask, 0, 1, TimeUnit.MILLISECONDS);
service.scheduleAtFixedRate(consumeTask, 0, 1, TimeUnit.MILLISECONDS);
}
private synchronized void increaseCheckedSize() {
checkedSize++;
}
private static boolean checkDbAndQrcode(File fileItem, String urlAndCode) {
try {
if (QREmbedder.check(ImageIO.read(fileItem), urlAndCode)) {
return true;
}
} catch (IOException e) {
logger.error("check qrcode error");
}
return false;
}
这里本来打算增加一层分发者,生产者往池里增加数据,分发者负责 运送到各个线程的bucket。这样生产者和分发者都是单线程,不存在竞争Stack和size,减少锁竞争和开销。后来觉得这样开始启动稍微麻烦了,当时晚上11点多,比较晚了,就简单来了。上一个程序ConcurrentStack 源码如下
public class ConcurrentStack<T> {
private AtomicReference<Node<T>> stacks = new AtomicReference<Node<T>>();
public T push(T e) {
Node<T> oldNode, newNode;
for (;;) {
oldNode = stacks.get();
newNode = new Node<T>(e, oldNode);
if (stacks.compareAndSet(oldNode, newNode)) {
return e;
}
}
}
public T pop() {
Node<T> oldNode, newNode;
for (;;) {
oldNode = stacks.get();
// 链表为空了,我就让他返回,上层来判断NP
if (oldNode == null) {
return null;
}
newNode = oldNode.next;
if (stacks.compareAndSet(oldNode, newNode)) {
return oldNode.object;
}
}
}
private static final class Node<T> {
private T object;
private Node<T> next;
private Node(T object, Node<T> next){
this.object = object;
this.next = next;
}
}
}
借鉴了某位大神,地址我找不到了,不贴了。解决了竞争资源pop时候的不重复取出消费的问题
上面一段比较容易忽略的是
if (!fdDescDir.exists()) {
if (!fdDescDir.mkdir()) {
logger.error("create dir error, place is " + descDir + "and this happended on " + sourceFile);
pipeline.push(m);
return;
}
}
这边比如Thread1和Thread2 进入 第一行时都出现没有dir,都在第二行尝试创建dir,由于有先后顺序,自然mkdir 失败,但是很难识别出是因为这种竞争失败或者其他原因,所以在这种情况下,为了让程序稳定运行,第二个尝试创建的线程将对象重新压栈退出,等下一次线程来重新消费那个对象
然后针对这次试验有几个经验数据
经测试,第一步生成二维码,然后码存数据库的时候5线程和8线程差的不是特别多,故此用5线程跑,大概5小时跑了200w,剩下100w又跑了4.5小时,因为是生成到基本一个文件夹下,越到后面越慢,再加上忘了去掉索引,mdworker不停跑索引导致速度下降过多
300w qrcode 一个3k左右,大概有10G,数据库存code 和一些时间等信息是290M的存储空间
cp 备份qrcode 在ssd上花了3个多小时 ,java FileUtil.copyFile 一个文件是20ms
4线程校验qrcode并且mv散列操作(其实是rename到300个文件夹)总计花了2小时
mysql查询300w数据select count(1) 是1.5s左右(本地mysql),有where条件但非索引是1.3-1.6s,和前面差不太多,有索引自然是0.3ms级别的
查看当前目录下文件的个数,包括子目录里的
ls -lR| grep "^-" | wc -l
还有上面说的zxing的qrcode识别 自己生成的二维码识别率只有86%。
后来要重新生成40w废弃的qrcode,采用2线程居然生成了2小时,唉,不谈了
以上数据都用本机亲跑,Mac Me293的
No Comments filed.