como usar hadoop - tsc.uc3m.es - departamento de …tsc.uc3m.es/~miguel/mlg/adjuntos/hadoop.pdf ·...

41
COMO USAR HADOOP Y sobrevivir a la experiencia

Upload: haphuc

Post on 27-Feb-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

COMO USARHADOOP

Y sobrevivir a la experiencia

ORGANIZACIÓN

• Descripción Hadoop:

• Procesos involucrados

• Esquema de Funcionamiento

• Instalación de Hadoop

• Instalación Local

• Descripción de Instalación en Cluster

ORGANIZACIÓN• HDFS:

• Acceder al sistema de ficheros de Hadoop.

• Carga y descarga de Información

• Ejecución de Procesos

• Lanzamiento, ejecución y verificación de procesos (en local)

• Lanzamiento, ejecución y verificacion de procesos (cluster)

REPOSITORIO DE INFORMACÓN

• https://www.tsc.uc3m.es/~hmolina/mlgp

• Acceso restringido con password del Departamento

HADOOPInstalación, configuración y uso

FLUJO DE DATOS DE HADOOP

Java MapReduce+DYLQJ�UXQ�WKURXJK�KRZ�WKH�0DS5HGXFH�SURJUDP�ZRUNV��WKH�QH[W�VWHS�LV�WR�H[SUHVV�LWLQ�FRGH��:H�QHHG�WKUHH�WKLQJV��D�PDS�IXQFWLRQ��D�UHGXFH�IXQFWLRQ��DQG�VRPH�FRGH�WRUXQ�WKH�MRE��7KH�PDS�IXQFWLRQ�LV�UHSUHVHQWHG�E\�WKH�Mapper�FODVV��ZKLFK�GHFODUHV�DQDEVWUDFW�map()�PHWKRG��([DPSOH�����VKRZV�WKH�LPSOHPHQWDWLRQ�RI�RXU�PDS�PHWKRG�

([DPSOH������0DSSHU�IRU�PD[LPXP�WHPSHUDWXUH�H[DPSOH

import java.io.IOException;

import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.LongWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Mapper;

public class MaxTemperatureMapper extends Mapper<LongWritable, Text, Text, IntWritable> {

private static final int MISSING = 9999; @Override public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { String line = value.toString(); String year = line.substring(15, 19); int airTemperature; if (line.charAt(87) == '+') { // parseInt doesn't like leading plus signs airTemperature = Integer.parseInt(line.substring(88, 92)); } else { airTemperature = Integer.parseInt(line.substring(87, 92)); } String quality = line.substring(92, 93); if (airTemperature != MISSING && quality.matches("[01459]")) { context.write(new Text(year), new IntWritable(airTemperature)); } }}

7KH�Mapper�FODVV�LV�D�JHQHULF�W\SH��ZLWK�IRXU�IRUPDO�W\SH�SDUDPHWHUV�WKDW�VSHFLI\�WKHLQSXW�NH\��LQSXW�YDOXH��RXWSXW�NH\��DQG�RXWSXW�YDOXH�W\SHV�RI�WKH�PDS�IXQFWLRQ��)RU�WKHSUHVHQW�H[DPSOH��WKH�LQSXW�NH\�LV�D�ORQJ�LQWHJHU�RIIVHW��WKH�LQSXW�YDOXH�LV�D�OLQH�RI�WH[W�

)LJXUH������0DS5HGXFH�ORJLFDO�GDWD�IORZ

22 | Chapter 2:ಗMapReduce

www.it-ebooks.info

ARQUITECTURA TÍPICA

)LJXUH������0DS5HGXFH�GDWD�IORZ�ZLWK�D�VLQJOH�UHGXFH�WDVN

7KH�QXPEHU�RI�UHGXFH�WDVNV�LV�QRW�JRYHUQHG�E\�WKH�VL]H�RI�WKH�LQSXW��EXW�LV�VSHFLILHGLQGHSHQGHQWO\��,Q�£7KH�'HIDXOW�0DS5HGXFH�-RE¤�RQ�SDJH������\RX�ZLOO�VHH�KRZ�WRFKRRVH�WKH�QXPEHU�RI�UHGXFH�WDVNV�IRU�D�JLYHQ�MRE�

:KHQ�WKHUH�DUH�PXOWLSOH�UHGXFHUV��WKH�PDS�WDVNV�SDUWLWLRQ�WKHLU�RXWSXW��HDFK�FUHDWLQJRQH�SDUWLWLRQ�IRU�HDFK�UHGXFH�WDVN��7KHUH�FDQ�EH�PDQ\�NH\V��DQG�WKHLU�DVVRFLDWHG�YDOXHV�LQ�HDFK�SDUWLWLRQ��EXW�WKH�UHFRUGV�IRU�DQ\�JLYHQ�NH\�DUH�DOO�LQ�D�VLQJOH�SDUWLWLRQ��7KHSDUWLWLRQLQJ�FDQ�EH�FRQWUROOHG�E\�D�XVHU�GHILQHG�SDUWLWLRQLQJ�IXQFWLRQ��EXW�QRUPDOO\�WKHGHIDXOW�SDUWLWLRQHU¢ZKLFK�EXFNHWV�NH\V�XVLQJ�D�KDVK�IXQFWLRQ¢ZRUNV�YHU\�ZHOO�

7KH�GDWD�IORZ�IRU�WKH�JHQHUDO�FDVH�RI�PXOWLSOH�UHGXFH�WDVNV�LV�LOOXVWUDWHG�LQ�)LJXUH�����7KLV�GLDJUDP�PDNHV�LW�FOHDU�ZK\�WKH�GDWD�IORZ�EHWZHHQ�PDS�DQG�UHGXFH�WDVNV�LV�FROOR�TXLDOO\�NQRZQ�DV�£WKH�VKXIIOH�¤�DV�HDFK�UHGXFH�WDVN� LV� IHG�E\�PDQ\�PDS�WDVNV��7KHVKXIIOH�LV�PRUH�FRPSOLFDWHG�WKDQ�WKLV�GLDJUDP�VXJJHVWV��DQG�WXQLQJ�LW�FDQ�KDYH�D�ELJLPSDFW�RQ�MRE�H[HFXWLRQ�WLPH��DV�\RX�ZLOO�VHH�LQ�£6KXIIOH�DQG�6RUW¤�RQ�SDJH�����

Scaling Out | 33

www.it-ebooks.info

ARQUITECTURA MÚLTIPLES REDUCTORES

)LJXUH������0DS5HGXFH�GDWD�IORZ�ZLWK�PXOWLSOH�UHGXFH�WDVNV

)LQDOO\��LW¦V�DOVR�SRVVLEOH�WR�KDYH�]HUR�UHGXFH�WDVNV��7KLV�FDQ�EH�DSSURSULDWH�ZKHQ�\RXGRQ¦W�QHHG�WKH�VKXIIOH�VLQFH�WKH�SURFHVVLQJ�FDQ�EH�FDUULHG�RXW�HQWLUHO\�LQ�SDUDOOHO��D�IHZH[DPSOHV�DUH�GLVFXVVHG�LQ�£1/LQH,QSXW)RUPDW¤�RQ�SDJH�������,Q�WKLV�FDVH��WKH�RQO\RII�QRGH�GDWD�WUDQVIHU�LV�ZKHQ�WKH�PDS�WDVNV�ZULWH�WR�+')6��VHH�)LJXUH������

Combiner Functions0DQ\�0DS5HGXFH�MREV�DUH�OLPLWHG�E\�WKH�EDQGZLGWK�DYDLODEOH�RQ�WKH�FOXVWHU��VR�LW�SD\VWR�PLQLPL]H�WKH�GDWD�WUDQVIHUUHG�EHWZHHQ�PDS�DQG�UHGXFH�WDVNV��+DGRRS�DOORZV�WKHXVHU�WR�VSHFLI\�D�FRPELQHU�IXQFWLRQ�WR�EH�UXQ�RQ�WKH�PDS�RXWSXW¢WKH�FRPELQHU�IXQF�WLRQ¦V�RXWSXW�IRUPV�WKH�LQSXW�WR�WKH�UHGXFH�IXQFWLRQ��6LQFH�WKH�FRPELQHU�IXQFWLRQ�LV�DQRSWLPL]DWLRQ��+DGRRS�GRHV�QRW�SURYLGH�D�JXDUDQWHH�RI�KRZ�PDQ\�WLPHV�LW�ZLOO�FDOO�LWIRU�D�SDUWLFXODU�PDS�RXWSXW�UHFRUG��LI�DW�DOO��,Q�RWKHU�ZRUGV��FDOOLQJ�WKH�FRPELQHU�IXQF�WLRQ�]HUR��RQH��RU�PDQ\�WLPHV�VKRXOG�SURGXFH�WKH�VDPH�RXWSXW�IURP�WKH�UHGXFHU�

34 | Chapter 2:ಗMapReduce

www.it-ebooks.info

ARQUITECTURA SIN REDUCTORES

)LJXUH������0DS5HGXFH�GDWD�IORZ�ZLWK�QR�UHGXFH�WDVNV

7KH�FRQWUDFW�IRU�WKH�FRPELQHU�IXQFWLRQ�FRQVWUDLQV�WKH�W\SH�RI�IXQFWLRQ�WKDW�PD\�EHXVHG��7KLV�LV�EHVW�LOOXVWUDWHG�ZLWK�DQ�H[DPSOH��6XSSRVH�WKDW�IRU�WKH�PD[LPXP�WHPSHU�DWXUH�H[DPSOH��UHDGLQJV�IRU�WKH�\HDU������ZHUH�SURFHVVHG�E\�WZR�PDSV��EHFDXVH�WKH\ZHUH�LQ�GLIIHUHQW�VSOLWV���,PDJLQH�WKH�ILUVW�PDS�SURGXFHG�WKH�RXWSXW�

(1950, 0)(1950, 20)(1950, 10)

$QG�WKH�VHFRQG�SURGXFHG�

(1950, 25)(1950, 15)

7KH�UHGXFH�IXQFWLRQ�ZRXOG�EH�FDOOHG�ZLWK�D�OLVW�RI�DOO�WKH�YDOXHV�

(1950, [0, 20, 10, 25, 15])

ZLWK�RXWSXW�

(1950, 25)

VLQFH����LV�WKH�PD[LPXP�YDOXH�LQ�WKH�OLVW��:H�FRXOG�XVH�D�FRPELQHU�IXQFWLRQ�WKDW��MXVWOLNH�WKH�UHGXFH�IXQFWLRQ��ILQGV�WKH�PD[LPXP�WHPSHUDWXUH�IRU�HDFK�PDS�RXWSXW��7KHUHGXFH�ZRXOG�WKHQ�EH�FDOOHG�ZLWK�

(1950, [20, 25])

DQG�WKH�UHGXFH�ZRXOG�SURGXFH�WKH�VDPH�RXWSXW�DV�EHIRUH��0RUH�VXFFLQFWO\��ZH�PD\H[SUHVV�WKH�IXQFWLRQ�FDOOV�RQ�WKH�WHPSHUDWXUH�YDOXHV�LQ�WKLV�FDVH�DV�IROORZV�

max(0, 20, 10, 25, 15) = max(max(0, 20, 10), max(25, 15)) = max(20, 25) = 25

Scaling Out | 35

www.it-ebooks.info

HADOOP

• Varias formas de ejecución:

• En modo Standalone: No se necesita configurar nada.

• En modo Servidor - nodo local: Un sistema basado en cliente servidor, pero que se ejecuta en modo local todo.

• En modo distribuido: Infraestructura completa con varios nodos de almacenamiento, ejecución, etc...

MODO STANDALONE

• Descomprimir la distribución de Hadoop

• Establecer variable JAVA_HOME

• Et Voilà!!!!

PRUEBA

• Descomprimir BBDD de Reuters

• Ejecutar el comando:hadoop jar hadoop-examples-1.1.2.jar

hadoop jar hadoop-examples-1.1.2.jar wordcount dir_reuters dir_output

• El directorio dir_output no debe existir

• Observar demora

ESTRUCTURA DE HADOOP

CONFIGURACIÓN EN MODO SERVIDOR LOCAL

• Creamos un directorio llamado

• conf_single

• Copiamos los contenidos de conf a conf_single

CONFIGURACIÓN DEL SERVIDOR MAESTRO

CORE-SITE.XML• Define el servidor que contendrá el sistema de ficheros

<configuration> <property> <name>fs.default.name</name> <value>hdfs://localhost:9000</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/tmp/hadoop/tmp</value> </property></configuration>

HDFS-SITE.XML• Define la configuración del comportamiento del sistema

distribuido de ficheros

En instalaciones standalone se configura que la información no esté replicada. En configuraciones en cluster, la información DEBE estar replicada

<configuration> <property> <name>dfs.name.dir</name> <value>/tmp/hadoop/name</value> </property> <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.data.dir</name> <value>/tmp/hadoop/data</value> </property></configuration>

CONFIGURACIÓN DEL JOBTRACKERMAPRED-SITE.XML

• Configura el coordinador de tareas

<configuration> <property> <name>mapred.job.tracker</name> <value>localhost:9001</value> </property> <property> <name>mapred.system.dir</name> <value>/hadoop/mapred/system</value> </property> <property> <name>mapred.local.dir</name> <value>/tmp/hadoop/tmp_mapred</value> </property></configuration>

OTROS FICHEROS• Hay que editar el fichero hadoop-env.sh

# The java implementation to use. Required.export JAVA_HOME="/Library/Java/JavaVirtualMachines/jdk1.7.0_21.jdk/Contents/Home"

export HADOOP_OPTS="-Djava.net.preferIPv4Stack=true -server"

INICIALIZACION DEL DFS

• Ejecutar :

hadoop --config conf_single namenode -formathadoop-daemon.sh --config conf_single start namenodehadoop --config conf_single dfs -mkdir /userhadoop --config conf_single dfs -chmod 755 /userhadoop --config conf_single dfs -mkdir /tmphadoop --config conf_single dfs -chmod 777 /tmphadoop --config conf_single dfs -mkdir /mapredhadoop --config conf_single dfs -chmod 755 /mapred

INICIAR EL SISTEMA

• bin/start-all.sh --config conf_single

• Acceso al estado de los sistemas:

• NameNode (DFS) http://localhost:50070

• JobTracker: http://localhost:50030

ACCESO AL SISTEMA DE FICHEROS

• El comando bin/hadoop invoca la API básica de hadoop.

• La aplicación dfs de la API básica permite el acceso al sistema de ficheros.

• bin/hadoop --config conf_single dfs

DFS

• Basado en los comandos UNIX

• hadoop --config conf_single dfs -ls

• hadoop --config conf_single dfs -mkdir

• hadoop --config conf_single dfs -chown

• hadoop --config conf_single dfs -chmod

DFS• Para subir ficheros del ordenador local al DFS

• hadoop --config conf_single dfs -put src dst

• hadoop --config conf_single dfs -copyFromLocal src dst

• Para descargar ficheros

• hadoop --config conf_single dfs -get src dst

• hadoop --config conf_single dfs -copyToLocal

INVOCAR UNA APLICACION

• Si está en un fichero jar :

• hadoop jar FICHERO_JAR.jar ClaseMain [parametros]

PRIMERA PRUEBA:CONTAR PALABRAS

• Se utilizará el programas ejemplos proporcionados por hadoop:

• hadoop --config conf_cluster jar hadoop-examples-1.1.2.jar

• En particular wordcount

• hadoop --config conf_cluster jar hadoop-examples-1.1.2.jar wordcount

INTEGRACIÓN DE HADOOPY PYTHON

MAPPER -> REDUCER

• Los datos se presentan por tuplas:

• <llave><dato>

• y se deben presentar

• <llave><dato>

ARQUITECTURA TÍPICA

)LJXUH������0DS5HGXFH�GDWD�IORZ�ZLWK�D�VLQJOH�UHGXFH�WDVN

7KH�QXPEHU�RI�UHGXFH�WDVNV�LV�QRW�JRYHUQHG�E\�WKH�VL]H�RI�WKH�LQSXW��EXW�LV�VSHFLILHGLQGHSHQGHQWO\��,Q�£7KH�'HIDXOW�0DS5HGXFH�-RE¤�RQ�SDJH������\RX�ZLOO�VHH�KRZ�WRFKRRVH�WKH�QXPEHU�RI�UHGXFH�WDVNV�IRU�D�JLYHQ�MRE�

:KHQ�WKHUH�DUH�PXOWLSOH�UHGXFHUV��WKH�PDS�WDVNV�SDUWLWLRQ�WKHLU�RXWSXW��HDFK�FUHDWLQJRQH�SDUWLWLRQ�IRU�HDFK�UHGXFH�WDVN��7KHUH�FDQ�EH�PDQ\�NH\V��DQG�WKHLU�DVVRFLDWHG�YDOXHV�LQ�HDFK�SDUWLWLRQ��EXW�WKH�UHFRUGV�IRU�DQ\�JLYHQ�NH\�DUH�DOO�LQ�D�VLQJOH�SDUWLWLRQ��7KHSDUWLWLRQLQJ�FDQ�EH�FRQWUROOHG�E\�D�XVHU�GHILQHG�SDUWLWLRQLQJ�IXQFWLRQ��EXW�QRUPDOO\�WKHGHIDXOW�SDUWLWLRQHU¢ZKLFK�EXFNHWV�NH\V�XVLQJ�D�KDVK�IXQFWLRQ¢ZRUNV�YHU\�ZHOO�

7KH�GDWD�IORZ�IRU�WKH�JHQHUDO�FDVH�RI�PXOWLSOH�UHGXFH�WDVNV�LV�LOOXVWUDWHG�LQ�)LJXUH�����7KLV�GLDJUDP�PDNHV�LW�FOHDU�ZK\�WKH�GDWD�IORZ�EHWZHHQ�PDS�DQG�UHGXFH�WDVNV�LV�FROOR�TXLDOO\�NQRZQ�DV�£WKH�VKXIIOH�¤�DV�HDFK�UHGXFH�WDVN� LV� IHG�E\�PDQ\�PDS�WDVNV��7KHVKXIIOH�LV�PRUH�FRPSOLFDWHG�WKDQ�WKLV�GLDJUDP�VXJJHVWV��DQG�WXQLQJ�LW�FDQ�KDYH�D�ELJLPSDFW�RQ�MRE�H[HFXWLRQ�WLPH��DV�\RX�ZLOO�VHH�LQ�£6KXIIOH�DQG�6RUW¤�RQ�SDJH�����

Scaling Out | 33

www.it-ebooks.info

CALCULO DE MAXIMA TEMPERATURA ANUAL

• Base de datos de sensores en E.E.U.U.

• Datos no ordenados tipo texto

• Estructura simple

• Datos de interes:

• Año: cols 15 a 18

• Temperatura: cols 104 a 106. Si vale 9999 no es lectura válida

• Col 137 debe valer 0 o 1

import java.io.IOException;

import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.LongWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Mapper;

public class MaxTemperatureMapper extends Mapper<LongWritable, Text, Text, IntWritable> { private static final int MISSING = 999; @Override public void map (LongWritable key, Text value, Context context) throws IOException, InterruptedException { String line = value.toString(); String year = line.substring(14,18); int airTemperature; String quality = line.substring(136,137); if (quality.matches("[01]") ) { airTemperature = Integer.parseInt(line.substring(103,106).trim()); if (airTemperature != MISSING ) context.write(new Text(year), new IntWritable(airTemperature)); } }

}

import java.io.IOException;

import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Reducer;

public class MaxTemperatureReducer extends Reducer<Text, IntWritable, Text, IntWritable> { @Override public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException{ int maxValue = Integer.MIN_VALUE; for (IntWritable value : values ) { maxValue = Math.max(maxValue, value.get()); } context.write(key, new IntWritable(maxValue)); }

import org.apache.hadoop.fs.Path;import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Job;import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class MaxTemperature {

/** * @param args */ public static void main(String[] args) throws Exception{ if (args.length != 2) { System.err.println("Usage: MaxTemperature <input path> <output path>"); System.exit(-1); } Job job = new Job(); job.setJarByClass(MaxTemperature.class); job.setJobName("Max Temperature"); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.setMapperClass(MaxTemperatureMapper.class); job.setReducerClass(MaxTemperatureReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); System.exit(job.waitForCompletion(true) ? 0 : 1); }

}

EJEMPLO MAX TEMPERATURE

• Ejemplo básico de MaxTemhadoop jar MaxTemp.jar MaxTemperature\ <dfs_src> \ <dfs_dst>

TÉCNICAS

• Streaming:

• Las aplicaciones para hacer MAP y REDUCE se escriben en un lenguaje que permita leer información desde STDIN, y escribir a STDOUT

• Se invoca una aplicación Hadoop que distribuye un proceso a los nodos de cómputo, el cual invoca la aplicación de MAP/REDUCE que se le ha indicado

• hadoop jar ../../hadoop-1.1.2/contrib/streaming/hadoop-streaming-1.1.2.jar

hadoop jar \ contrib/streaming/hadoop-streaming-1.1.2.jar \ -input {DFSDIR}/input -output {DFSDIR}/output_py \ -mapper {GLOBAL_DIR}/map.py \ -reducer {GLOBAL_DIR}/reduce.py

STREAMING (CONT)

• La aplicación que hace MAP/REDUCE DEBE SER ACCESIBLE en el sistema de ficheros normal de cada nodo (no en el DFS de Hadoop)

• Solución: Copiar los ficheros a clusterdata

ARQUITECTURA COMBINER

)LJXUH������0DS5HGXFH�GDWD�IORZ�ZLWK�D�VLQJOH�UHGXFH�WDVN

7KH�QXPEHU�RI�UHGXFH�WDVNV�LV�QRW�JRYHUQHG�E\�WKH�VL]H�RI�WKH�LQSXW��EXW�LV�VSHFLILHGLQGHSHQGHQWO\��,Q�£7KH�'HIDXOW�0DS5HGXFH�-RE¤�RQ�SDJH������\RX�ZLOO�VHH�KRZ�WRFKRRVH�WKH�QXPEHU�RI�UHGXFH�WDVNV�IRU�D�JLYHQ�MRE�

:KHQ�WKHUH�DUH�PXOWLSOH�UHGXFHUV��WKH�PDS�WDVNV�SDUWLWLRQ�WKHLU�RXWSXW��HDFK�FUHDWLQJRQH�SDUWLWLRQ�IRU�HDFK�UHGXFH�WDVN��7KHUH�FDQ�EH�PDQ\�NH\V��DQG�WKHLU�DVVRFLDWHG�YDOXHV�LQ�HDFK�SDUWLWLRQ��EXW�WKH�UHFRUGV�IRU�DQ\�JLYHQ�NH\�DUH�DOO�LQ�D�VLQJOH�SDUWLWLRQ��7KHSDUWLWLRQLQJ�FDQ�EH�FRQWUROOHG�E\�D�XVHU�GHILQHG�SDUWLWLRQLQJ�IXQFWLRQ��EXW�QRUPDOO\�WKHGHIDXOW�SDUWLWLRQHU¢ZKLFK�EXFNHWV�NH\V�XVLQJ�D�KDVK�IXQFWLRQ¢ZRUNV�YHU\�ZHOO�

7KH�GDWD�IORZ�IRU�WKH�JHQHUDO�FDVH�RI�PXOWLSOH�UHGXFH�WDVNV�LV�LOOXVWUDWHG�LQ�)LJXUH�����7KLV�GLDJUDP�PDNHV�LW�FOHDU�ZK\�WKH�GDWD�IORZ�EHWZHHQ�PDS�DQG�UHGXFH�WDVNV�LV�FROOR�TXLDOO\�NQRZQ�DV�£WKH�VKXIIOH�¤�DV�HDFK�UHGXFH�WDVN� LV� IHG�E\�PDQ\�PDS�WDVNV��7KHVKXIIOH�LV�PRUH�FRPSOLFDWHG�WKDQ�WKLV�GLDJUDP�VXJJHVWV��DQG�WXQLQJ�LW�FDQ�KDYH�D�ELJLPSDFW�RQ�MRE�H[HFXWLRQ�WLPH��DV�\RX�ZLOO�VHH�LQ�£6KXIIOH�DQG�6RUW¤�RQ�SDJH�����

Scaling Out | 33

www.it-ebooks.info

import org.apache.hadoop.fs.Path;import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Job;import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class MaxTemperature {

/** * @param args */ public static void main(String[] args) throws Exception{ if (args.length != 2) { System.err.println("Usage: MaxTemperature <input path> <output path>"); System.exit(-1); } Job job = new Job(); job.setJarByClass(MaxTemperature.class); job.setJobName("Max Temperature"); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.setMapperClass(MaxTemperatureMapper.class); job.setCombinerClass(MaxTemperatureReducer.class); job.setReducerClass(MaxTemperatureReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); System.exit(job.waitForCompletion(true) ? 0 : 1); }

}

CON COMBINER

• hadoop jar ../../hadoop-1.1.2/contrib/streaming/hadoop-streaming-1.1.2.jar

hadoop jar hadoop-streaming-1.1.2.jar \ -input {DFSDIR}/input -output {DFSDIR}/output_py \ -mapper {GLOBAL_DIR}/map.py \ -combiner {GLOBAL_DIR}/reduce.py -reducer {GLOBAL_DIR}/reduce.py

http://nostromo.tsc.uc3m.es:50030/jobtracker.jsp

http://subserver1.tsc.uc3m.es:50070/dfshealth.jsp

BIBILIOGRAFIA

• Hadoop: The Definitive Guide. 3rd Edition.Tom White. O’Reilly

• Writing an Hadoop MapReduce Program in Python

• Hadoop 1.1.2 Documentation