Quantcast
Channel: CodeSection,代码区,数据库(综合) - CodeSec
Viewing all articles
Browse latest Browse all 6262

使用Oracle外部表对大文件排序 对大 文件

$
0
0

问题:对一个一列两亿行的无序的文本文件进行排序,生成一个排好序的新文本文件。

1. 生成无序文件,BigFileTest.Java代码如下:

[java] view plain copy
import java.io.BufferedWriter;
import java.io.FileWriter;
import java.util.Random;
public class BigFileTest {
static Random random = new Random();
public static void main (String[] args) throws Exception {
createFile();
}
public static void createFile() throws exception {
BufferedWriter fw = new BufferedWriter(new FileWriter("D:\\BigFileTest\\bigfile.txt"));
for (int i=1; i<200000000; i++) {
fw.write(random.nextLong() + "");
fw.newLine();
if (i % 10000 == 0) {
fw.flush();
}
}
}
}
javac BigFileTest.java
Java BigFileTest

至此生成了一个两亿行的文本文件bigfile.txt

2. 建立外部表

[sql] view plain copy
create directory data_dir as 'D:\BigFileTest\';
create table bt_ext_test(a varchar2(30))
organization external
(type oracle_loader
default directory data_dir
access parameters
(records delimited by newline characterset zhs16gbk
badfile data_dir:'bigfile.bad'
discardfile data_dir:'bigfile.dsc'
logfile 'bigfile.log'
fields terminated by 0x'09' ldrtrim
missing field values are null
reject rows with all null fields
)
location ('bigfile.txt')
)
parallel
reject limit unlimited;
3. 使用sqlplus的spool生成排序的新文件
[sql] view plain copy
set echo off
set feedback off
set termout off
set arrarsize 5000
set heading off
set head off
set trimout on
set pagesize 0
set trimspool on
set ;inesize 30
spool result.txt
select /*+ parallel(bt_ext_test,8) */ * from bt_ext_test order by a;
spool off
exit;

在4个双核CUP,64位oracle11.2上,用8个并行查询,生成排序文件用时32分钟。



Viewing all articles
Browse latest Browse all 6262

Latest Images

Trending Articles