hadoop mapreduce多表关联 -

myhadoop

浏览: 152189 次

最近访客更多访客>>

965025150

若为子龙

xiaoweishu

leisureWong

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

hadoop mapreduce多表关联

博客分类：

Hadoop

mapreduce hadoop

假设有如下两个文件，一个是表是公司和地址的序号的对应，一个表是地址的序号和地址的名称的对应。

表1：

[plain]

A:Beijing Red Star 1

A:Shenzhen Thunder 3

A:Guangzhou Honda 2

A:Beijing Rising 1

A:Guangzhou Development Bank 2

A:Tencent 3

A:Back of Beijing 1

表2：

[plain]

B:1 Beijing

B:2 Guangzhou

B:3 Shenzhen

B:4 Xian

mapreduce如下：

[plain]

private static final Text typeA = new Text("A:");

private static final Text typeB = new Text("B:");

private static Log log = LogFactory.getLog(MTJoin.class);

public static class Map extends Mapper<Object, Text, Text, MapWritable> {

public void map(Object key, Text value, Context context)

throws IOException, InterruptedException {

String valueStr = value.toString();

String type = valueStr.substring(0, 2);

String content = valueStr.substring(2);

log.info(content);

if(type.equals("A:"))

{

String[] contentArray = content.split("\t");

String city = contentArray[0];

String address = contentArray[1];

MapWritable map = new MapWritable();

map.put(typeA, new Text(city));

context.write(new Text(address), map);

}

else if(type.equals("B:"))

{

String[] contentArray = content.split("\t");

String adrNum = contentArray[0];

String adrName = contentArray[1];

MapWritable map = new MapWritable();

map.put(typeB, new Text(adrName));

context.write(new Text(adrNum), map);

}

public static class Reduce extends Reducer<Text, MapWritable, Text, Text> {

public void reduce(Text key, Iterable<MapWritable> values, Context context)

throws IOException, InterruptedException {

Iterator<MapWritable> it = values.iterator();

List<Text> cityList = new ArrayList<Text>();

List<Text> adrList = new ArrayList<Text>();

while(it.hasNext())

{

MapWritable map = it.next();

if(map.containsKey(typeA))

{

cityList.add((Text)map.get(typeA));

}

else if(map.containsKey(typeB))

{

adrList.add((Text)map.get(typeB));

}

for(int i = 0; i < cityList.size(); i++)

{

for(int j = 0; j < adrList.size(); j++)

{

context.write(cityList.get(i), adrList.get(j));

}

原理很简单，map的出口，以地址的序号作为key，然后出来的时候，公司名称放一个list，地址的名称放一个list，两个list的内容作笛卡儿积，就得到了结果。

输出如下：

[plain]

Beijing Red Star Beijing

Beijing Rising Beijing

Back of Beijing Beijing

Guangzhou Honda Guangzhou

Guangzhou Development Bank Guangzhou

Shenzhen Thunder Shenzhen

Tencent Shenzhen

分享到：

hadoop mapreduce单表关联 | HBase行数统计

2014-04-16 13:03
浏览 658
评论(0)
分类:编程语言
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

hadoop mapreduce多表关联

评论

发表评论

相关推荐

最近访客 更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

hadoop mapreduce多表关联

评论

发表评论

相关推荐

hadoop-集群管理（1）——配置文件

hadoop-集群管理（2）——内存设置

Hadoop学习总结

Hadoop分布式文件系统：架构和设计要点

Hadoop技术一句话介绍

Hadoop分析日志实例的详细步骤及出现的问题分析和解决

hadoop集群调优

hadoop mapreduce单表关联

转-用Hadoop构建电影推荐系统

HDFS追本溯源：体系架构详解

Hadoop面试45个题目和参考答案

转-- Hadoop常见错误问题及解决方法（1）

hadoop参数配置优化

hadoop配置、运行错误总结二

hadoop配置、运行错误总结一

转-Hadoop虽强大，但不是万能的

单节点配置SecondaryNameNode

oop主节点（NameNode）备份策略以及恢复方法

hadoop常见错误及处理方法

Hadoop集群三种作业调度算法介绍

最近访客更多访客>>