€cdocutils.nodes document q)q}q(U nametypesq}q(Xterminating jobflowsqNXcreating jobflowsqNXcreating a connectionqNX"creating custom jar job flow stepsq NX creating streaming jobflow stepsq NXemr_tutq ˆX5an introduction to boto's elastic mapreduce interfaceq NuUsubstitution_defsq }qUparse_messagesq]qUcurrent_sourceqNU decorationqNUautofootnote_startqKUnameidsq}q(hUterminating-jobflowsqhUcreating-jobflowsqhUcreating-a-connectionqh U"creating-custom-jar-job-flow-stepsqh U creating-streaming-jobflow-stepsqh Uemr-tutqh U5an-introduction-to-boto-s-elastic-mapreduce-interfacequUchildrenq]q(cdocutils.nodes target q)q }q!(U rawsourceq"X .. _emr_tut:Uparentq#hUsourceq$X=/Users/kyleknap/Documents/GitHub/boto/docs/source/emr_tut.rstq%Utagnameq&Utargetq'U attributesq(}q)(Uidsq*]Ubackrefsq+]Udupnamesq,]Uclassesq-]Unamesq.]Urefidq/huUlineq0KUdocumentq1hh]ubcdocutils.nodes section q2)q3}q4(h"Uh#hh$h%Uexpect_referenced_by_nameq5}q6h h sh&Usectionq7h(}q8(h,]h-]h+]h*]q9(hheh.]q:(h h euh0Kh1hUexpect_referenced_by_idq;}q)q?}q@(h"X5An Introduction to boto's Elastic Mapreduce interfaceqAh#h3h$h%h&UtitleqBh(}qC(h,]h-]h+]h*]h.]uh0Kh1hh]qDcdocutils.nodes Text qEX5An Introduction to boto's Elastic Mapreduce interfaceqF…qG}qH(h"hAh#h?ubaubcdocutils.nodes paragraph qI)qJ}qK(h"X¦This tutorial focuses on the boto interface to Elastic Mapreduce from Amazon Web Services. This tutorial assumes that you have already downloaded and installed boto.qLh#h3h$h%h&U paragraphqMh(}qN(h,]h-]h+]h*]h.]uh0Kh1hh]qOhEX¦This tutorial focuses on the boto interface to Elastic Mapreduce from Amazon Web Services. This tutorial assumes that you have already downloaded and installed boto.qP…qQ}qR(h"hLh#hJubaubh2)qS}qT(h"Uh#h3h$h%h&h7h(}qU(h,]h-]h+]h*]qVhah.]qWhauh0K h1hh]qX(h>)qY}qZ(h"XCreating a Connectionq[h#hSh$h%h&hBh(}q\(h,]h-]h+]h*]h.]uh0K h1hh]q]hEXCreating a Connectionq^…q_}q`(h"h[h#hYubaubhI)qa}qb(h"XŽThe first step in accessing Elastic Mapreduce is to create a connection to the service. There are two ways to do this in boto. The first is:qch#hSh$h%h&hMh(}qd(h,]h-]h+]h*]h.]uh0K h1hh]qehEXŽThe first step in accessing Elastic Mapreduce is to create a connection to the service. There are two ways to do this in boto. The first is:qf…qg}qh(h"hch#haubaubcdocutils.nodes doctest_block qi)qj}qk(h"Xr>>> from boto.emr.connection import EmrConnection >>> conn = EmrConnection('', '')h#hSh$h%h&U doctest_blockqlh(}qm(U xml:spaceqnUpreserveqoh*]h+]h,]h-]h.]uh0Kh1hh]qphEXr>>> from boto.emr.connection import EmrConnection >>> conn = EmrConnection('', '')qq…qr}qs(h"Uh#hjubaubhI)qt}qu(h"XÜAt this point the variable conn will point to an EmrConnection object. In this example, the AWS access key and AWS secret key are passed in to the method explicitly. Alternatively, you can set the environment variables:qvh#hSh$h%h&hMh(}qw(h,]h-]h+]h*]h.]uh0Kh1hh]qxhEXÜAt this point the variable conn will point to an EmrConnection object. In this example, the AWS access key and AWS secret key are passed in to the method explicitly. Alternatively, you can set the environment variables:qy…qz}q{(h"hvh#htubaubhI)q|}q}(h"X_AWS_ACCESS_KEY_ID - Your AWS Access Key ID \ AWS_SECRET_ACCESS_KEY - Your AWS Secret Access Keyh#hSh$h%h&hMh(}q~(h,]h-]h+]h*]h.]uh0Kh1hh]qhEX]AWS_ACCESS_KEY_ID - Your AWS Access Key ID AWS_SECRET_ACCESS_KEY - Your AWS Secret Access Keyq€…q}q‚(h"X_AWS_ACCESS_KEY_ID - Your AWS Access Key ID \ AWS_SECRET_ACCESS_KEY - Your AWS Secret Access Keyh#h|ubaubhI)qƒ}q„(h"X?and then call the constructor without any arguments, like this:q…h#hSh$h%h&hMh(}q†(h,]h-]h+]h*]h.]uh0Kh1hh]q‡hEX?and then call the constructor without any arguments, like this:qˆ…q‰}qŠ(h"h…h#hƒubaubhi)q‹}qŒ(h"X>>> conn = EmrConnection()qh#hSh$h%h&hlh(}qŽ(hnhoh*]h+]h,]h-]h.]uh0Kh1hh]qhEX>>> conn = EmrConnection()q…q‘}q’(h"Uh#h‹ubaubhI)q“}q”(h"XWThere is also a shortcut function in boto that makes it easy to create EMR connections:q•h#hSh$h%h&hMh(}q–(h,]h-]h+]h*]h.]uh0Kh1hh]q—hEXWThere is also a shortcut function in boto that makes it easy to create EMR connections:q˜…q™}qš(h"h•h#h“ubaubhi)q›}qœ(h"XF>>> import boto.emr >>> conn = boto.emr.connect_to_region('us-west-2')h#hSh$h%h&hlh(}q(hnhoh*]h+]h,]h-]h.]uh0K"h1hh]qžhEXF>>> import boto.emr >>> conn = boto.emr.connect_to_region('us-west-2')qŸ…q }q¡(h"Uh#h›ubaubhI)q¢}q£(h"XsIn either case, conn points to an EmrConnection object which we will use throughout the remainder of this tutorial.q¤h#hSh$h%h&hMh(}q¥(h,]h-]h+]h*]h.]uh0K$h1hh]q¦hEXsIn either case, conn points to an EmrConnection object which we will use throughout the remainder of this tutorial.q§…q¨}q©(h"h¤h#h¢ubaubeubh2)qª}q«(h"Uh#h3h$h%h&h7h(}q¬(h,]h-]h+]h*]q­hah.]q®h auh0K(h1hh]q¯(h>)q°}q±(h"X Creating Streaming JobFlow Stepsq²h#hªh$h%h&hBh(}q³(h,]h-]h+]h*]h.]uh0K(h1hh]q´hEX Creating Streaming JobFlow Stepsqµ…q¶}q·(h"h²h#h°ubaubhI)q¸}q¹(h"XéUpon creating a connection to Elastic Mapreduce you will next want to create one or more jobflow steps. There are two types of steps, streaming and custom jar, both of which have a class in the boto Elastic Mapreduce implementation.qºh#hªh$h%h&hMh(}q»(h,]h-]h+]h*]h.]uh0K)h1hh]q¼hEXéUpon creating a connection to Elastic Mapreduce you will next want to create one or more jobflow steps. There are two types of steps, streaming and custom jar, both of which have a class in the boto Elastic Mapreduce implementation.q½…q¾}q¿(h"hºh#h¸ubaubhI)qÀ}qÁ(h"XpCreating a streaming step that runs the AWS wordcount example, itself written in Python, can be accomplished by:qÂh#hªh$h%h&hMh(}qÃ(h,]h-]h+]h*]h.]uh0K-h1hh]qÄhEXpCreating a streaming step that runs the AWS wordcount example, itself written in Python, can be accomplished by:qÅ…qÆ}qÇ(h"hÂh#hÀubaubhi)qÈ}qÉ(h"X>>> from boto.emr.step import StreamingStep >>> step = StreamingStep(name='My wordcount example', ... mapper='s3n://elasticmapreduce/samples/wordcount/wordSplitter.py', ... reducer='aggregate', ... input='s3n://elasticmapreduce/samples/wordcount/input', ... output='s3n:///output/wordcount_output')h#hªh$h%h&hlh(}qÊ(hnhoh*]h+]h,]h-]h.]uh0K4h1hh]qËhEX>>> from boto.emr.step import StreamingStep >>> step = StreamingStep(name='My wordcount example', ... mapper='s3n://elasticmapreduce/samples/wordcount/wordSplitter.py', ... reducer='aggregate', ... input='s3n://elasticmapreduce/samples/wordcount/input', ... output='s3n:///output/wordcount_output')qÌ…qÍ}qÎ(h"Uh#hÈubaubhI)qÏ}qÐ(h"X<where is a bucket you have created in S3.qÑh#hªh$h%h&hMh(}qÒ(h,]h-]h+]h*]h.]uh0K6h1hh]qÓhEX<where is a bucket you have created in S3.qÔ…qÕ}qÖ(h"hÑh#hÏubaubhI)q×}qØ(h"XdNote that this statement does not run the step, that is accomplished later when we create a jobflow.qÙh#hªh$h%h&hMh(}qÚ(h,]h-]h+]h*]h.]uh0K8h1hh]qÛhEXdNote that this statement does not run the step, that is accomplished later when we create a jobflow.qÜ…qÝ}qÞ(h"hÙh#h×ubaubhI)qß}qà(h"X‹Additional arguments of note to the streaming jobflow step are cache_files, cache_archive and step_args. The options cache_files and cache_archive enable you to use the Hadoops distributed cache to share files amongst the instances that run the step. The argument step_args allows one to pass additional arguments to Hadoop streaming, for example modifications to the Hadoop job configuration.qáh#hªh$h%h&hMh(}qâ(h,]h-]h+]h*]h.]uh0K:h1hh]qãhEX‹Additional arguments of note to the streaming jobflow step are cache_files, cache_archive and step_args. The options cache_files and cache_archive enable you to use the Hadoops distributed cache to share files amongst the instances that run the step. The argument step_args allows one to pass additional arguments to Hadoop streaming, for example modifications to the Hadoop job configuration.qä…qå}qæ(h"háh#hßubaubeubh2)qç}qè(h"Uh#h3h$h%h&h7h(}qé(h,]h-]h+]h*]qêhah.]qëh auh0K=h1hh]qì(h>)qí}qî(h"X"Creating Custom Jar Job Flow Stepsqïh#hçh$h%h&hBh(}qð(h,]h-]h+]h*]h.]uh0K=h1hh]qñhEX"Creating Custom Jar Job Flow Stepsqò…qó}qô(h"hïh#híubaubhI)qõ}qö(h"XœThe second type of jobflow step executes tasks written with a custom jar. Creating a custom jar step for the AWS CloudBurst example can be accomplished by:q÷h#hçh$h%h&hMh(}qø(h,]h-]h+]h*]h.]uh0K?h1hh]qùhEXœThe second type of jobflow step executes tasks written with a custom jar. Creating a custom jar step for the AWS CloudBurst example can be accomplished by:qú…qû}qü(h"h÷h#hõubaubhi)qý}qþ(h"Xó>>> from boto.emr.step import JarStep >>> step = JarStep(name='Coudburst example', ... jar='s3n://elasticmapreduce/samples/cloudburst/cloudburst.jar', ... step_args=['s3n://elasticmapreduce/samples/cloudburst/input/s_suis.br', ... 's3n://elasticmapreduce/samples/cloudburst/input/100k.br', ... 's3n:///output/cloudfront_output', ... 36, 3, 0, 1, 240, 48, 24, 24, 128, 16])h#hçh$h%h&hlh(}qÿ(hnhoh*]h+]h,]h-]h.]uh0KGh1hh]rhEXó>>> from boto.emr.step import JarStep >>> step = JarStep(name='Coudburst example', ... jar='s3n://elasticmapreduce/samples/cloudburst/cloudburst.jar', ... step_args=['s3n://elasticmapreduce/samples/cloudburst/input/s_suis.br', ... 's3n://elasticmapreduce/samples/cloudburst/input/100k.br', ... 's3n:///output/cloudfront_output', ... 36, 3, 0, 1, 240, 48, 24, 24, 128, 16])r…r}r(h"Uh#hýubaubhI)r}r(h"XãNote that this statement does not actually run the step, that is accomplished later when we create a jobflow. Also note that this JarStep does not include a main_class argument since the jar MANIFEST.MF has a Main-Class entry.rh#hçh$h%h&hMh(}r(h,]h-]h+]h*]h.]uh0KIh1hh]rhEXãNote that this statement does not actually run the step, that is accomplished later when we create a jobflow. Also note that this JarStep does not include a main_class argument since the jar MANIFEST.MF has a Main-Class entry.r …r }r (h"jh#jubaubeubh2)r }r (h"Uh#h3h$h%h&h7h(}r(h,]h-]h+]h*]rhah.]rhauh0KLh1hh]r(h>)r}r(h"XCreating JobFlowsrh#j h$h%h&hBh(}r(h,]h-]h+]h*]h.]uh0KLh1hh]rhEXCreating JobFlowsr…r}r(h"jh#jubaubhI)r}r(h"X¿Once you have created one or more jobflow steps, you will next want to create and run a jobflow. Creating a jobflow that executes either of the steps we created above can be accomplished by:rh#j h$h%h&hMh(}r(h,]h-]h+]h*]h.]uh0KMh1hh]rhEX¿Once you have created one or more jobflow steps, you will next want to create and run a jobflow. Creating a jobflow that executes either of the steps we created above can be accomplished by:r…r }r!(h"jh#jubaubhi)r"}r#(h"Xè>>> import boto.emr >>> conn = boto.emr.connect_to_region('us-west-2') >>> jobid = conn.run_jobflow(name='My jobflow', ... log_uri='s3:///jobflow_logs', ... steps=[step])h#j h$h%h&hlh(}r$(hnhoh*]h+]h,]h-]h.]uh0KSh1hh]r%hEXè>>> import boto.emr >>> conn = boto.emr.connect_to_region('us-west-2') >>> jobid = conn.run_jobflow(name='My jobflow', ... log_uri='s3:///jobflow_logs', ... steps=[step])r&…r'}r((h"Uh#j"ubaubhI)r)}r*(h"XŠThe method will not block for the completion of the jobflow, but will immediately return. The status of the jobflow can be determined by:r+h#j h$h%h&hMh(}r,(h,]h-]h+]h*]h.]uh0KUh1hh]r-hEXŠThe method will not block for the completion of the jobflow, but will immediately return. The status of the jobflow can be determined by:r.…r/}r0(h"j+h#j)ubaubhi)r1}r2(h"XF>>> status = conn.describe_jobflow(jobid) >>> status.state u'STARTING'h#j h$h%h&hlh(}r3(hnhoh*]h+]h,]h-]h.]uh0KYh1hh]r4hEXF>>> status = conn.describe_jobflow(jobid) >>> status.state u'STARTING'r5…r6}r7(h"Uh#j1ubaubhI)r8}r9(h"XÇOne can then use this state to block for a jobflow to complete. Valid jobflow states currently defined in the AWS API are COMPLETED, FAILED, TERMINATED, RUNNING, SHUTTING_DOWN, STARTING and WAITING.r:h#j h$h%h&hMh(}r;(h,]h-]h+]h*]h.]uh0K[h1hh]r<hEXÇOne can then use this state to block for a jobflow to complete. Valid jobflow states currently defined in the AWS API are COMPLETED, FAILED, TERMINATED, RUNNING, SHUTTING_DOWN, STARTING and WAITING.r=…r>}r?(h"j:h#j8ubaubhI)r@}rA(h"XšIn some cases you may not have built all of the steps prior to running the jobflow. In these cases additional steps can be added to a jobflow by running:rBh#j h$h%h&hMh(}rC(h,]h-]h+]h*]h.]uh0K]h1hh]rDhEXšIn some cases you may not have built all of the steps prior to running the jobflow. In these cases additional steps can be added to a jobflow by running:rE…rF}rG(h"jBh#j@ubaubhi)rH}rI(h"X0>>> conn.add_jobflow_steps(jobid, [second_step])rJh#j h$h%h&hlh(}rK(hnhoh*]h+]h,]h-]h.]uh0K_h1hh]rLhEX0>>> conn.add_jobflow_steps(jobid, [second_step])rM…rN}rO(h"Uh#jHubaubhI)rP}rQ(h"XÐIf you wish to add additional steps to a running jobflow you may want to set the keep_alive parameter to True in run_jobflow so that the jobflow does not automatically terminate when the first step completes.rRh#j h$h%h&hMh(}rS(h,]h-]h+]h*]h.]uh0Kah1hh]rThEXÐIf you wish to add additional steps to a running jobflow you may want to set the keep_alive parameter to True in run_jobflow so that the jobflow does not automatically terminate when the first step completes.rU…rV}rW(h"jRh#jPubaubhI)rX}rY(h"XThe run_jobflow method has a number of important parameters that are worth investigating. They include parameters to change the number and type of EC2 instances on which the jobflow is executed, set a SSH key for manual debugging and enable AWS console debugging.rZh#j h$h%h&hMh(}r[(h,]h-]h+]h*]h.]uh0Kch1hh]r\hEXThe run_jobflow method has a number of important parameters that are worth investigating. They include parameters to change the number and type of EC2 instances on which the jobflow is executed, set a SSH key for manual debugging and enable AWS console debugging.r]…r^}r_(h"jZh#jXubaubeubh2)r`}ra(h"Uh#h3h$h%h&h7h(}rb(h,]h-]h+]h*]rchah.]rdhauh0Kfh1hh]re(h>)rf}rg(h"XTerminating JobFlowsrhh#j`h$h%h&hBh(}ri(h,]h-]h+]h*]h.]uh0Kfh1hh]rjhEXTerminating JobFlowsrk…rl}rm(h"jhh#jfubaubhI)rn}ro(h"XæBy default when all the steps of a jobflow have finished or failed the jobflow terminates. However, if you set the keep_alive parameter to True or just want to halt the execution of a jobflow early you can terminate a jobflow by:rph#j`h$h%h&hMh(}rq(h,]h-]h+]h*]h.]uh0Kgh1hh]rrhEXæBy default when all the steps of a jobflow have finished or failed the jobflow terminates. However, if you set the keep_alive parameter to True or just want to halt the execution of a jobflow early you can terminate a jobflow by:rs…rt}ru(h"jph#jnubaubhi)rv}rw(h"Xq>>> import boto.emr >>> conn = boto.emr.connect_to_region('us-west-2') >>> conn.terminate_jobflow('')h#j`h$h%h&hlh(}rx(hnhoh*]h+]h,]h-]h.]uh0Kkh1hh]ryhEXq>>> import boto.emr >>> conn = boto.emr.connect_to_region('us-west-2') >>> conn.terminate_jobflow('')rz…r{}r|(h"Uh#jvubaubeubeubeh"UU transformerr}NU footnote_refsr~}rUrefnamesr€}rUsymbol_footnotesr‚]rƒUautofootnote_refsr„]r…Usymbol_footnote_refsr†]r‡U citationsrˆ]r‰h1hU current_linerŠNUtransform_messagesr‹]rŒcdocutils.nodes system_message r)rŽ}r(h"Uh(}r(h,]UlevelKh*]h+]Usourceh%h-]h.]UlineKUtypeUINFOr‘uh]r’hI)r“}r”(h"Uh(}r•(h,]h-]h+]h*]h.]uh#jŽh]r–hEX-Hyperlink target "emr-tut" is not referenced.r—…r˜}r™(h"Uh#j“ubah&hMubah&Usystem_messageršubaUreporterr›NUid_startrœKU autofootnotesr]ržU citation_refsrŸ}r Uindirect_targetsr¡]r¢Usettingsr£(cdocutils.frontend Values r¤or¥}r¦(Ufootnote_backlinksr§KUrecord_dependenciesr¨NU rfc_base_urlr©Uhttp://tools.ietf.org/html/rªU tracebackr«ˆUpep_referencesr¬NUstrip_commentsr­NU toc_backlinksr®Uentryr¯U language_coder°Uenr±U datestampr²NU report_levelr³KU _destinationr´NU halt_levelrµKU strip_classesr¶NhBNUerror_encoding_error_handlerr·Ubackslashreplacer¸Udebugr¹NUembed_stylesheetrº‰Uoutput_encoding_error_handlerr»Ustrictr¼U sectnum_xformr½KUdump_transformsr¾NU docinfo_xformr¿KUwarning_streamrÀNUpep_file_url_templaterÁUpep-%04drÂUexit_status_levelrÃKUconfigrÄNUstrict_visitorrÅNUcloak_email_addressesrƈUtrim_footnote_reference_spacerljUenvrÈNUdump_pseudo_xmlrÉNUexpose_internalsrÊNUsectsubtitle_xformrˉU source_linkrÌNUrfc_referencesrÍNUoutput_encodingrÎUutf-8rÏU source_urlrÐNUinput_encodingrÑU utf-8-sigrÒU_disable_configrÓNU id_prefixrÔUU tab_widthrÕKUerror_encodingrÖUUTF-8r×U_sourcerØh%Ugettext_compactrÙˆU generatorrÚNUdump_internalsrÛNU smart_quotesr܉U pep_base_urlrÝUhttp://www.python.org/dev/peps/rÞUsyntax_highlightrßUlongràUinput_encoding_error_handlerráj¼Uauto_id_prefixrâUidrãUdoctitle_xformrä‰Ustrip_elements_with_classesråNU _config_filesræ]rçUfile_insertion_enabledrèˆU raw_enabledréKU dump_settingsrêNubUsymbol_footnote_startrëKUidsrì}rí(hhªhj hj`hh3hhçhh3hhSuUsubstitution_namesrî}rïh&h1h(}rð(h,]h*]h+]Usourceh%h-]h.]uU footnotesrñ]ròUrefidsró}rôh]rõh asub.