<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Blog4Java &#187; Java</title>
	<atom:link href="http://malsolo.com/blog4java/?feed=rss2&#038;tag=java" rel="self" type="application/rss+xml" />
	<link>http://malsolo.com/blog4java</link>
	<description>A personal and Java blog, likely only for me</description>
	<lastBuildDate>Tue, 31 Mar 2015 15:52:42 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=4.1.1</generator>
	<item>
		<title>Webinar Confirmation: Java, Ubuntu and browsers hell</title>
		<link>http://malsolo.com/blog4java/?p=812</link>
		<comments>http://malsolo.com/blog4java/?p=812#comments</comments>
		<pubDate>Tue, 24 Mar 2015 09:19:29 +0000</pubDate>
		<dc:creator><![CDATA[Javier (@jbbarquero)]]></dc:creator>
				<category><![CDATA[Java]]></category>
		<category><![CDATA[Personal]]></category>
		<category><![CDATA[Chrome]]></category>
		<category><![CDATA[Java Plug-in]]></category>
		<category><![CDATA[Ubuntu]]></category>

		<guid isPermaLink="false">http://malsolo.com/blog4java/?p=812</guid>
		<description><![CDATA[Hello Javier, Your registration is confirmed for the webinar&#8230; We are looking forward to having you join us. To help maximize your webinar experience we recommend that you join a test meeting before the session to check your system and &#8230; <a href="http://malsolo.com/blog4java/?p=812">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p><em>Hello Javier,</p>
<p>Your registration is confirmed for the webinar&#8230; We are looking forward to having you join us.</p>
<p><img src="http://malsolo.com/blog4java/wp-includes/images/smilies/icon_biggrin.gif" alt=":D" class="wp-smiley" /></p>
<p>To help maximize your webinar experience we recommend that you join a test meeting before the session to check your system and browser compatibility at <u>http://www.webex.com/test-meeting.html</u>.</em></p>
<p>Let&#8217;s try&#8230;</p>
<div id="attachment_816" style="width: 969px" class="wp-caption alignnone"><a href="http://malsolo.com/blog4java/wp-content/uploads/2015/03/Screenshot-from-2015-03-24-094340.png"><img src="http://malsolo.com/blog4java/wp-content/uploads/2015/03/Screenshot-from-2015-03-24-094340.png" alt="Java is not working." width="959" height="511" class="size-full wp-image-816" /></a><p class="wp-caption-text">Java is not working.</p></div>
<p>Dammit!</p>
<p>Now I have to waste half an hour for discovering the problem and solve it.</p>
<p>Let&#8217;s see, <a href="https://java.com/en/download/faq/chrome.xml" title="How do I use Java with the Google Chrome browser?" target="_blank">How do I use Java with the Google Chrome browser?</a></p>
<p><em><strong>Chrome and Linux</strong><br />
Starting with Chrome version 35, NPAPI (Netscape Plug-in API) support was removed for the Linux platform. For more information, see Chrome and NPAPI (<a href="http://blog.chromium.org/2013/09/saying-goodbye-to-our-old-friend-npapi.html" title="Saying Goodbye to Our Old Friend NPAPI" target="_blank">blog.chromium.org</a>).</p>
<p><u>Firefox is the recommended browser for Java on Linux.</u></em></p>
<p>No problem, let&#8217;s use Firefox. But&#8230;</p>
<div id="attachment_814" style="width: 761px" class="wp-caption alignnone"><a href="http://malsolo.com/blog4java/wp-content/uploads/2015/03/Screenshot-from-2015-03-24-094112.png"><img src="http://malsolo.com/blog4java/wp-content/uploads/2015/03/Screenshot-from-2015-03-24-094112.png" alt="Expired or not-yet-valid certificate" width="751" height="364" class="size-full wp-image-814" /></a><p class="wp-caption-text">Java Security</p></div>
<p>Sometimes I miss Windows and its closed environment.</p>
<p>Don&#8217;t give up. In the error page, the details shows the Java Console:</p>
<div id="attachment_818" style="width: 529px" class="wp-caption alignnone"><a href="http://malsolo.com/blog4java/wp-content/uploads/2015/03/Screenshot-from-2015-03-24-094812.png"><img src="http://malsolo.com/blog4java/wp-content/uploads/2015/03/Screenshot-from-2015-03-24-094812.png" alt="Java Console screenshot" width="519" height="434" class="size-full wp-image-818" /></a><p class="wp-caption-text">Java Plug-in 11.40.2.25</p></div>
<p>And the <a href="http://java.com/en/download/help/jcp_security.xml" title="How do I control when an untrusted applet or application runs in my web browser? " target="_blank">more information</a> link leads to the instructions to handle the Java Security via the Control Panel. Including a link to the configuration of the <a href="http://java.com/en/download/faq/exception_sitelist.xml" title="How can I configure the Exception Site List? " target="_blank">Exception Site List</a>.</p>
<p>But&#8230;</p>
<p><em><strong>Find the Java Control Panel</strong><br />
» <a href="http://java.com/en/download/help/win_controlpanel.xml" title="Where is the Java Control Panel on Windows? " target="_blank">Windows</a><br />
» <a href="http://java.com/en/download/help/mac_controlpanel.xml" title="Where is the Java Control Panel on my Mac? " target="_blank">Mac OS X</a></em> </p>
<p><strong>Where is the Java Control Panel on Linux?!!!</strong> <img src="http://malsolo.com/blog4java/wp-includes/images/smilies/icon_neutral.gif" alt=":|" class="wp-smiley" /></p>
<p><u>Keep calm and open the Terminal</u>:</p>
<p></p><pre class="crayon-plain-tag">$ whereis java
java: /usr/bin/java
$ ls -la /usr/bin/java
lrwxrwxrwx 1 root root 22 may  8  2014 /usr/bin/java -> /etc/alternatives/java
$ cd /usr/lib/jvm/java-8-oracle/jre/bin
$ ls
ControlPanel  javaws.real  keytool  policytool   servertool
java          jcontrol     orbd     rmid         tnameserv
javaws        jjs          pack200  rmiregistry  unpack200
$ ./ControlPanel</pre><p></p>
<p>Now, you can follow the instructions:</p>
<div id="attachment_821" style="width: 637px" class="wp-caption alignnone"><a href="http://malsolo.com/blog4java/wp-content/uploads/2015/03/Screenshot-from-2015-03-24-095904.png"><img src="http://malsolo.com/blog4java/wp-content/uploads/2015/03/Screenshot-from-2015-03-24-095904.png" alt="Java Control Panel" width="627" height="641" class="size-full wp-image-821" /></a><p class="wp-caption-text">Java Control Panel</p></div>
<p>Go to the Security tab:</p>
<div id="attachment_822" style="width: 637px" class="wp-caption alignnone"><a href="http://malsolo.com/blog4java/wp-content/uploads/2015/03/Screenshot-from-2015-03-24-095934.png"><img src="http://malsolo.com/blog4java/wp-content/uploads/2015/03/Screenshot-from-2015-03-24-095934.png" alt="Java Control Panel: Security tab" width="627" height="641" class="size-full wp-image-822" /></a><p class="wp-caption-text">Java Control Panel: Security tab</p></div>
<p>Press Edit <u>S</u>ite List&#8230;</p>
<div id="attachment_823" style="width: 560px" class="wp-caption alignnone"><a href="http://malsolo.com/blog4java/wp-content/uploads/2015/03/Screenshot-from-2015-03-24-100040.png"><img src="http://malsolo.com/blog4java/wp-content/uploads/2015/03/Screenshot-from-2015-03-24-100040.png" alt="Exception Site List" width="550" height="379" class="size-full wp-image-823" /></a><p class="wp-caption-text">Exception Site List</p></div>
<p>Add the location:</p>
<div id="attachment_824" style="width: 560px" class="wp-caption alignnone"><a href="http://malsolo.com/blog4java/wp-content/uploads/2015/03/Screenshot-from-2015-03-24-100150.png"><img src="http://malsolo.com/blog4java/wp-content/uploads/2015/03/Screenshot-from-2015-03-24-100150.png" alt="Exception Site List: add URL" width="550" height="379" class="size-full wp-image-824" /></a><p class="wp-caption-text">Exception Site List: add URL</p></div>
<p>Ok. Ok.</p>
<p>Now re-try:</p>
<div id="attachment_825" style="width: 604px" class="wp-caption alignnone"><a href="http://malsolo.com/blog4java/wp-content/uploads/2015/03/Screenshot-from-2015-03-24-100356.png"><img src="http://malsolo.com/blog4java/wp-content/uploads/2015/03/Screenshot-from-2015-03-24-100356.png" alt="I&#039;m really tired" width="594" height="463" class="size-full wp-image-825" /></a><p class="wp-caption-text">I&#8217;m really tired</p></div>
<p>I&#8217;ve spent too much time with you, so <em>I accept the risk</em> <img src="http://malsolo.com/blog4java/wp-includes/images/smilies/icon_sad.gif" alt=":(" class="wp-smiley" /></p>
<div id="attachment_826" style="width: 630px" class="wp-caption alignnone"><a href="http://malsolo.com/blog4java/wp-content/uploads/2015/03/Screenshot-from-2015-03-24-100624.png"><img src="http://malsolo.com/blog4java/wp-content/uploads/2015/03/Screenshot-from-2015-03-24-100624-1024x576.png" alt="Horray!" width="620" height="349" class="size-large wp-image-826" /></a><p class="wp-caption-text">Horray!</p></div>
<p>Piece of cake.</p>
<p>It&#8217;s been fun and annoying at the same time.</p>
]]></content:encoded>
			<wfw:commentRss>http://malsolo.com/blog4java/?feed=rss2&#038;p=812</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Getting started with Spark</title>
		<link>http://malsolo.com/blog4java/?p=679</link>
		<comments>http://malsolo.com/blog4java/?p=679#comments</comments>
		<pubDate>Mon, 02 Mar 2015 15:27:16 +0000</pubDate>
		<dc:creator><![CDATA[Javier (@jbbarquero)]]></dc:creator>
				<category><![CDATA[Big Data]]></category>
		<category><![CDATA[Architecture]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[Java]]></category>
		<category><![CDATA[Spark]]></category>

		<guid isPermaLink="false">http://malsolo.com/blog4java/?p=679</guid>
		<description><![CDATA[Spark Introduction Apache Spark is a cluster computing platform designed to be fast, expresive, high level, general-purpose, fault-tolerante and compatible with Hadoop (Spark can work directly with HDFS, S3 and so on). Spark can also be defined as a framework &#8230; <a href="http://malsolo.com/blog4java/?p=679">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<h1>Spark Introduction</h1>
<p>Apache Spark is a cluster <strong>computing platform</strong> designed to be fast, expresive, high level, general-purpose, fault-tolerante and compatible with Hadoop (Spark can work directly with HDFS, S3 and so on). Spark can also be defined as a framework for distributed processing and analisys of big amounts of data. People from databricks (the company behind Spark) called it a distributed executing engine for large scale analytics.</p>
<p>Spark improves efficiency over Hadoop because it uses in-memory computing primitives. According to the <a title="Apache Spark™" href="https://spark.apache.org/" target="_blank">Apache Spark site</a>, it can run programs up to 100x faster than Hadoop MapReduce in memory, or 10x faster on disk.</p>
<p>It also claims to improve usability through rich Scala, Python and Java APIs as well as an interactive shell, in Scala and Python. Spark is written in Scala.</p>
<h2>Spark Architecture</h2>
<p>Spark has three main components</p>
<div id="attachment_688" style="width: 630px" class="wp-caption alignnone"><a href="http://malsolo.com/blog4java/wp-content/uploads/2015/02/apache_spark_stack.png"><img class="size-large wp-image-688" src="http://malsolo.com/blog4java/wp-content/uploads/2015/02/apache_spark_stack-1024x534.png" alt="The Apache Spark stack" width="620" height="323" /></a><p class="wp-caption-text">Apache Spark stack</p></div>
<h3>Spark Core (API)</h3>
<p>A high level programming framework that allows programmers to focus on the logic and not the plumbing of distributing programming, that is, the steps to be done without worrying of coordinating tasks, networking of so on.</p>
<p>These steps are define by RDD (Resilient Distributed Datasets), the main programming abstraction that represent a collection of items distributed across many compute nodes that can be manipulated in parallel.</p>
<h3>Spark clustering</h3>
<p>Spark itself doesn&#8217;t manage the cluster, but it supports three cluster managers:</p>
<ul>
<li>Standalone: a simple cluster manager included in Spark itself called the Standalone Scheduler.</li>
<li><a title="Apache Hadoop YARN" href="http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html" target="_blank">Hadoop YARN</a>: see my <a title=" Getting started with Hadoop" href="http://malsolo.com/blog4java/?p=516" target="_blank">introduction to Apache Hadoop</a>.</li>
<li><a title="Apache Mesos" href="http://mesos.apache.org/" target="_blank">Apache Mesos</a>.</li>
</ul>
<h3>Spark stack</h3>
<p>Finally, Spark provides high level specialized components that are closely integrated in order to provide one great platform.</p>
<p>The current components are:</p>
<ul>
<li>Spark SQL: for querying data via SQL.</li>
<li>Spark Streaming: for real-time processing of live streams of data.</li>
<li>GraphX: a library for manipulating graphs and performing graph-parallel computations.</li>
<li>MLLib: a library for machine learning providing algorithms for doing so (classification, regression, &#8230;)</li>
</ul>
<h2>Spark Usage</h2>
<p>There are two ways to work with Spark:</p>
<ul>
<li>The Spark interactive shells</li>
<li>Spark standalone applications</li>
</ul>
<h3>Spark Shell</h3>
<p>It&#8217;s an interactive shell from the command line that has two implementations, one in Python and the other in Scala, an <a title="RPEL: Read–eval–print loop" href="http://en.wikipedia.org/wiki/Read%E2%80%93eval%E2%80%93print_loop" target="_blank">RPEL</a> that is very useful for learning the API or for data exploration.</p>
<p>Spark’s shells allow you to interact with data not only on your single machine, but on disk or in memory across many machines, thanks to the distributed nature of Spark.</p>
<h3>Spark Applications</h3>
<p>The other way to work with Spark is by creating standalone applications either in Python, Scala or Java. Use them for large scale data processing.</p>
<h2>Spark main concepts</h2>
<h3>Driver program</h3>
<p>It&#8217;s the program that launches the distributed operations on a cluster.</p>
<p>The Spark shell is a driver program.</p>
<p>The application that you write, with its <em>main</em> function that defines de datasets and applies operations on them is a driver program.</p>
<h3>Spark Context (sc)</h3>
<p>It&#8217;s the main entry point to the Spark API.</p>
<p>When using the shell, a preconfigured SparkContext is automatically created and it&#8217;s available in the variable called <strong><em>sc</em></strong>.</p>
<p>When writing applications, the first thing that you need to create is your own instance of the SparkContext.</p>
<h3>Resilient Distributed Dataset (RDD)</h3>
<p>The goal of Spark is to allow you to operate in datasets in a single machine and that these operations work in the same way in a distributed cluster.</p>
<p>For achieving this, Spark offers the <strong>Resilient Distributed Dataset</strong> (RDD), they are <span style="text-decoration: underline;">immutable collections</span> (<em>dataset</em>) of objects that Spark distributes (<em>distributed</em>) through the cluster. They are loaded from a source of data and, since they are immutable, RDDs are also created as a result of transformation on existing RDDs (map, filters, etc.). Finally, Spark automatically rebuilds them in a node if there is a failure in another node (<em>resilient</em>)</p>
<p>There are two types of RDD operations on RDDs:</p>
<ul>
<li><strong>Transformations</strong>: lazy operations to build RDDs based on the current RDD.</li>
<li><strong>Actions</strong>: return a result or write the RDD to storage. It implies a computation that actually applies the pending transformation that were lazily defined.</li>
</ul>
<p>In the Spark jargon, this is called a Direct Acyclic Graph (DAG) of operations. The RDDs track the series of transformations used to build them by maintaining a pointer to its parents.</p>
<h1>Spark Installation</h1>
<p>Go to <a title="Download Spark" href="https://spark.apache.org/downloads.html" target="_blank">https://spark.apache.org/downloads.html</a> and then:</p>
<ol>
<li>Choose a Spark release (1.2.1 is the last at the time of this writing)</li>
<li>Choose a package type: select the package type of <em>“Pre-built for Hadoop 2.4 and later”</em></li>
<li>Choose a download type: <em>Direct Download</em> is OK, but the default <em>Apache Mirror</em> works well.</li>
<li>Click on the link after <em>Download Spark</em>, for instance <strong><em>spark-1.2.1.tgz</em></strong>, to download Spark.</li>
</ol>
<p>Unpack the downloaded file and move into that directory in order to use the interactive shell:</p><pre class="crayon-plain-tag">$ tar -xf spark-1.2.0-bin-hadoop2.4.tgz
$ cd spark-1.2.0-bin-hadoop2.4</pre><p></p>
<h2>Using the Shell</h2>
<p>The Python version of the Spark shell is available via the command <strong>bin/pyspark</strong> and the Scala version of the shell by using <strong>bin/spark-shell</strong>.</p>
<p>Note: the shell accept code completion with the Tab key.</p>
<p>Let&#8217;s try the <strong>Scala</strong> shell:</p><pre class="crayon-plain-tag">$ bin/spark-shell
Spark assembly has been built with Hive, including Datanucleus jars on classpath
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
15/02/26 17:23:45 INFO SecurityManager: Changing view acls to: Javier
15/02/26 17:23:45 INFO SecurityManager: Changing modify acls to: Javier
15/02/26 17:23:45 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(Javier); users with modify permissions: Set(Javier)
15/02/26 17:23:45 INFO HttpServer: Starting HTTP Server
15/02/26 17:23:45 INFO Utils: Successfully started service 'HTTP class server' on port 46130.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 1.2.1
      /_/

Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_31)
Type in expressions to have them evaluated.
Type :help for more information.
15/02/26 17:23:50 WARN Utils: Your hostname, xxx resolves to a loopback address: 127.0.1.1; using 192.168.2.49 instead (on interface eth0)
15/02/26 17:23:50 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
15/02/26 17:23:50 INFO SecurityManager: Changing view acls to: Javier
15/02/26 17:23:50 INFO SecurityManager: Changing modify acls to: Javier
15/02/26 17:23:50 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(Javier); users with modify permissions: Set(Javier)
15/02/26 17:23:51 INFO Slf4jLogger: Slf4jLogger started
15/02/26 17:23:51 INFO Remoting: Starting remoting
15/02/26 17:23:51 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@xxx.malsolo.lan:55248]
15/02/26 17:23:51 INFO Utils: Successfully started service 'sparkDriver' on port 55248.
15/02/26 17:23:51 INFO SparkEnv: Registering MapOutputTracker
15/02/26 17:23:51 INFO SparkEnv: Registering BlockManagerMaster
15/02/26 17:23:52 INFO DiskBlockManager: Created local directory at /tmp/spark-1420fe71-6907-408a-b44c-9547ba1a2c49/spark-909fad01-a3df-484b-bd30-1ea6006396e9
15/02/26 17:23:52 INFO MemoryStore: MemoryStore started with capacity 265.1 MB
15/02/26 17:23:52 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/02/26 17:23:52 INFO HttpFileServer: HTTP File server directory is /tmp/spark-82586699-a230-47e4-8148-2cc4dcc741ec/spark-72f09be4-797a-4612-a845-e4fd1e578e76
15/02/26 17:23:52 INFO HttpServer: Starting HTTP Server
15/02/26 17:23:52 INFO Utils: Successfully started service 'HTTP file server' on port 41493.
15/02/26 17:23:53 INFO Utils: Successfully started service 'SparkUI' on port 4040.
15/02/26 17:23:53 INFO SparkUI: Started SparkUI at http://xxx.malsolo.lan:4040
15/02/26 17:23:53 INFO Executor: Starting executor ID  on host localhost
15/02/26 17:23:53 INFO Executor: Using REPL class URI: http://192.168.2.49:46130
15/02/26 17:23:53 INFO AkkaUtils: Connecting to HeartbeatReceiver: akka.tcp://sparkDriver@xxx.malsolo.lan:55248/user/HeartbeatReceiver
15/02/26 17:23:53 INFO NettyBlockTransferService: Server created on 40938
15/02/26 17:23:53 INFO BlockManagerMaster: Trying to register BlockManager
15/02/26 17:23:53 INFO BlockManagerMasterActor: Registering block manager localhost:40938 with 265.1 MB RAM, BlockManagerId(, localhost, 40938)
15/02/26 17:23:53 INFO BlockManagerMaster: Registered BlockManager
15/02/26 17:23:53 INFO SparkILoop: Created spark context..
Spark context available as sc.

scala&gt;</pre><p>To exit either shell, press Ctrl-D.</p><pre class="crayon-plain-tag">scala&gt; Stopping spark context.
15/02/26 17:27:40 INFO SparkUI: Stopped Spark web UI at http://xxx.malsolo.lan:4040
15/02/26 17:27:40 INFO DAGScheduler: Stopping DAGScheduler
15/02/26 17:27:41 INFO MapOutputTrackerMasterActor: MapOutputTrackerActor stopped!
15/02/26 17:27:41 INFO MemoryStore: MemoryStore cleared
15/02/26 17:27:41 INFO BlockManager: BlockManager stopped
15/02/26 17:27:41 INFO BlockManagerMaster: BlockManagerMaster stopped
15/02/26 17:27:41 INFO SparkContext: Successfully stopped SparkContext
15/02/26 17:27:41 INFO RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
15/02/26 17:27:41 INFO RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.
15/02/26 17:27:41 INFO RemoteActorRefProvider$RemotingTerminator: Remoting shut down.
$</pre><p>It&#8217;s possible to control the verbosity of the logging by creating a <em>conf/log4j.properties</em> file (use the existing <em>conf/log4j.properties.template</em>, Currently, Spark uses log4j 1.2.17, so you can find more details at <a title="Apache log4j™ 1.2" href="http://logging.apache.org/log4j/1.2/" target="_blank">Apache log4j™ 1.2</a> website.) and then changing the line:</p>
<p><strong>log4j.rootCategory=INFO, console</strong></p>
<p>To:</p>
<p><strong>log4j.rootCategory=WARN, console</strong></p>
<p>Now, with the shell we can try some commands like examining the sc variable, create RDDs, filtering them and so on.</p><pre class="crayon-plain-tag">scala&gt; sc
res0: org.apache.spark.SparkContext = org.apache.spark.SparkContext@76af34b5

scala&gt; val lines = sc.textFile("README.md")
lines: org.apache.spark.rdd.RDD[String] = README.md MappedRDD[1] at textFile at :12

scala&gt; lines.count()
res1: Long = 98                                                                 

scala&gt; lines.first()
res2: String = # Apache Spark</pre><p></p>
<p>There is an INFO message that informs of the URL of the Spark UI (INFO SparkUI: Started SparkUI at http://[ipaddress]:4040), so you can use it to see information about the tasks and clusters.</p>
<div id="attachment_725" style="width: 904px" class="wp-caption alignnone"><a href="http://malsolo.com/blog4java/wp-content/uploads/2015/03/SparkUI-2.png"><img src="http://malsolo.com/blog4java/wp-content/uploads/2015/03/SparkUI-2.png" alt="Spark UI at 4040" width="894" height="574" class="size-full wp-image-725" /></a><p class="wp-caption-text">Spark UI</p></div>
<h1>Spark Operations</h1>
<p>Once we have the Spark shell, let&#8217;s use it to take a look to the available operations before we dive into creating applications.</p>
<h2>Creating RDDs</h2>
<p>You can turn an existing collection into a RDD (parallelize it), you can load an external file (several formats: text, JSON, CSV, SequenceFiles, objects) or even existing Hadoop InputFormat (with <em>sc.hadoopFile()</em>) </p>
<p></p><pre class="crayon-plain-tag">scala> val numbers = sc.parallelize(List(1,2,3))
numbers: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[0] at parallelize at <console>:12

scala> val lines = sc.textFile("README.md")
lines: org.apache.spark.rdd.RDD[String] = README.md MappedRDD[2] at textFile at <console>:12

scala></pre><p></p>
<h2>Transformations</h2>
<p>As we said earlier, transformations are lazy evaluated operations on RDDs that return a new RDD.</p>
<p>You can pass each element through a function (with <em>map()</em>) or keep elements that pass a predicate (with <em>filter()</em>) or produce zero or more elements for each element (with <em>flatMap()</em>) and so on.</p>
<p></p><pre class="crayon-plain-tag">scala> val squares = numbers.map(x => x*x)
squares: org.apache.spark.rdd.RDD[Int] = MappedRDD[3] at map at <console>:14

scala> val spark = lines.filter(line => line.contains("Spark"))
spark: org.apache.spark.rdd.RDD[String] = FilteredRDD[4] at filter at <console>:14

scala> val sequences = numbers.flatMap(x => 1 to x)
sequences: org.apache.spark.rdd.RDD[Int] = FlatMappedRDD[5] at flatMap at <console>:14

scala> val words = lines.flatMap(line => line.split(" "))
words: org.apache.spark.rdd.RDD[String] = FlatMappedRDD[6] at flatMap at <console>:14

scala></pre><p></p>
<h2>Actions</h2>
<p>They are the operations that return a final value to the driver program or write data to an external storage system that result in the evaluation of the transformations in the RDD.</p>
<p>For instance, retrieve the contents (<em>collect()</em>), return the first n elements (<em>take()</em>), count the number of elements (<em>count()</em>), combine elements with an associative function (<em>reduce()</em>) or write elements to a text file (<em>saveAsTextFile()</em>)</p>
<p></p><pre class="crayon-plain-tag">scala> sequences.collect()
res0: Array[Int] = Array(1, 1, 2, 1, 2, 3)                                      

scala> squares.take(2)
res1: Array[Int] = Array(1, 4)

scala> words.count()
res2: Long = 524

scala> numbers.reduce((x, y) => x + y)
res4: Int = 6

scala> spark.saveAsTextFile("borrar.txt")

scala></pre><p></p>
<h2>Key/Value Pairs</h2>
<p>There is a special type of RDD, <strong>Pair RDDs</strong>, that that contain elements that are tuples, that is, a key-value pair, being key and value of any type.</p>
<p>They are very useful for perform aggregations, grouping, counting. They can be obtained from some initial ETL (extract, transform, load) operations.</p>
<p>The pair RDDS can be partitioned across nodes for improving speed by allowing similar keys to be accesible on the same node.</p>
<p>Regarding operations, Spark offers special operation for Pair RDDs that allow you to act on each key in parallel, for instance, <em>reduceByKey()</em> to aggregate data by key, <em>join()</em> to merge two RDDs by grouping elements with the same key, or even <em>sortByKey()</em>.</p>
<p></p><pre class="crayon-plain-tag">scala> val pets = sc.parallelize(List(("cat", 1), ("dog", 1), ("cat", 2)))
pets: org.apache.spark.rdd.RDD[(String, Int)] = ParallelCollectionRDD[8] at parallelize at <console>:12

scala> pets.collect()
res12: Array[(String, Int)] = Array((cat,1), (dog,1), (cat,2))

scala> pets.reduceByKey((a, b) => a + b).collect()
res9: Array[(String, Int)] = Array((dog,1), (cat,3))

scala> pets.groupByKey().collect()
res10: Array[(String, Iterable[Int])] = Array((dog,CompactBuffer(1)), (cat,CompactBuffer(1, 2)))

scala> pets.sortByKey().collect()
res11: Array[(String, Int)] = Array((cat,1), (cat,2), (dog,1))

scala></pre><p></p>
<p>Now let&#8217;s use the Shell to see how easily you can implement the MapReduce WordCount example in a single line:</p>
<p></p><pre class="crayon-plain-tag">scala> val counts = lines.flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey((x, y) => x + y)
counts: org.apache.spark.rdd.RDD[(String, Int)] = ShuffledRDD[18] at reduceByKey at <console>:14

scala> counts.collect()
res16: Array[(String, Int)] = Array((package,1), (this,1), (Because,1), (Python,2), (cluster.,1), (its,1), ([run,1), (general,2), (YARN,,1), (have,1), (pre-built,1), (locally.,1), (changed,1), (locally,2), (sc.parallelize(1,1), (only,1), (several,1), (This,2), (basic,1), (first,1), (documentation,3), (Configuration,1), (learning,,1), (graph,1), (Hive,2), (["Specifying,1), ("yarn-client",1), (page](http://spark.apache.org/documentation.html),1), ([params]`.,1), (application,1), ([project,2), (prefer,1), (SparkPi,2), (<http://spark.apache.org/>,1), (engine,1), (version,1), (file,1), (documentation,,1), (MASTER,1), (example,3), (are,1), (systems.,1), (params,1), (scala>,1), (provides,1), (refer,2), (configure,1), (Interactive,2), (distribution.,1), (can,6), (build,3), (when,1), (Apache,1),...
scala></pre><p></p>
<h1>Spark Applications</h1>
<p>For writing a Spark Application it&#8217;s possible to use Scala, Python or Java. What I&#8217;m going to do is to use Java 8 to take advantage of the new features of the language in order to have a less verbose syntax.</p>
<h2>Word count Java application</h2>
<p>First, use the appropriate dependency. For instance, with maven:</p>
<p></p><pre class="crayon-plain-tag">&lt;dependency&gt;
			&lt;groupId&gt;org.apache.spark&lt;/groupId&gt;
			&lt;artifactId&gt;spark-core_2.10&lt;/artifactId&gt;
			&lt;version&gt;1.2.1&lt;/version&gt;
		&lt;/dependency&gt;</pre><p></p>
<p>Then, you have to instantiate your own <strong>SparkContext</strong>, and it&#8217;s done via a <strong>SparkConf</strong> object. We use the minimal configuration: a name for the cluster URL (&#8220;local&#8221; to use a local cluster) and an application name to identify the application on the cluster:</p>
<p></p><pre class="crayon-plain-tag">SparkConf conf = new SparkConf().setMaster(&quot;local&quot;).setAppName(&quot;Word Count with Spark&quot;);
		JavaSparkContext sc = new JavaSparkContext(conf);</pre><p></p>
<p>Now, before writing Java, code it&#8217;s necessary to explain the differences with Scala.</p>
<p>Spark is written in Scala and it takes full advantage of its features. But Java lacks of some of them. So Spark provides alternatives with interfaces or concrete classes.</p>
<p>Let&#8217;s see the Word Count example in Spark written in Scala:</p>
<p></p><pre class="crayon-plain-tag">val file = spark.textFile(&quot;file&quot;)
val counts = file.flatMap(line =&gt; line.split(&quot; &quot;))
                 .map(word =&gt; (word, 1))
                 .reduceByKey(_ + _)
counts.saveAsTextFile(&quot;out&quot;)</pre><p></p>
<p>Java didn&#8217;t accept functions as parameters, so Spark provides interfaces in the <em>org.apache.spark.api.java.function</em> package to be implemented, either as anonymous inner classes or as named classes, to be passed as arguments of the functions (<em>flatMap()</em>, <em>map()</em>, <em>reduceByKey()</em>, &#8230;)</p>
<p>In our case, these are the functions that are needed:</p>
<ul>
<li><strong>FlatMapFunction<T, R></strong> with the method <em>Iterable<R> call(T t)</em> to return zero or more output records from each input record (t).</li>
<li><strong>PairFunction<T, K, V></strong> with the method <em>Tuple2<K, V> call(T t)</em> to return key-value pairs (Tuple2<K, V>), and can be used to construct PairRDDs.</li>
<li><strong>Function2<T1, T2, R></strong> with the method <em>R call(T1 v1, T2 v2)</em>, a two-argument function that takes arguments of type T1 and T2 and returns an R.</li>
</ul>
<p>Java doesn&#8217;t have a native implementation of Tuple (as <a href="https://twitter.com/lukaseder" title="Lukas Eder twitter account" target="_blank">Lukas Eder</a> said <em>On a side-note</em> at <a href="http://blog.jooq.org/2015/01/23/how-to-translate-sql-group-by-and-aggregations-to-java-8/" title="How to Translate SQL GROUP BY and Aggregations to Java 8" target="_blank">here</a> &#8220;Why the JDK doesn’t ship with built-in tuples like C#’s or Scala’s escapes me.&#8221;, in other words, <strong><em>&#8220;Functional programming without tuples is like coffee without sugar: A bitter punch in your face.&#8221;</em></strong>)</p>
<p>For that reason, Spark provides several implementations for Tuple in the <em>scala</em> package.</p>
<p>But, Java has evolved, and now functions are first class citizens, so it’s possible to pass them as parameters for other functions, thus it’s very easy to write the Java 8 version of the word count in Spark using lambdas (since the provided interfaces have a sole public method. And the result is almost as clear as the Scala version)</p>
<p></p><pre class="crayon-plain-tag">JavaRDD&lt;String&gt; lines = sc.textFile(&quot;file&quot;);
		JavaPairRDD&lt;String, Integer&gt; counts = lines.flatMap(line -&gt; Arrays.asList(line.split(&quot; &quot;)))
			.mapToPair(word -&gt; new Tuple2&lt;String, Integer&gt;(word, 1))
			.reduceByKey((x, y) -&gt; x + y);
		counts.saveAsTextFile(&quot;out&quot;);</pre><p></p>
<p>The complete source code is <a href="https://github.com/jbbarquero/spark-examples" title="spark-examples" target="_blank">available at GitHub</a>.</p>
<h2>Build and run</h2>
<p>Now, we only have to build the project (with maven) and submit it to Spark (with <strong>bin/spark-submit</strong>). From the root directory of the application (note: the out directory must not exists, so remove it previously if you need so with <strong><em>rm -r out</em></strong>):</p>
<p></p><pre class="crayon-plain-tag">$ mvn clean install
$ ~/Applications/spark-1.2.1-bin-hadoop2.4/bin/spark-submit --class com.malsolo.spark.examples.WordCount target/spark-examples-0.0.1-SNAPSHOT.jar</pre><p></p>
<p>Finally, we can see the results to compare with the ones obtained <a href="http://malsolo.com/blog4java/?p=516" title=" Getting started with Hadoop" target="_blank">using Hadoop</a>:</p>
<p></p><pre class="crayon-plain-tag">$ cat out/part-00000 | grep President
(President,,26)
(President,72)
(President.,8)
(Vice-President,5)
(Vice-President,,5)
(Vice-President;,1)
(President;,3)
(Vice-President.,1)

$ cat out/part-00000 | grep United
(United,85)

$ cat out/part-00000 | grep State
(State,47)
(States,46)
(States.",1)
(State.,6)
(States,,55)
(State,,20)
(States:,2)
(States.,8)
(Statement,1)
(States;,13)
(State;,4)</pre><p></p>
<h1>Shared Variables</h1>
<p>Spark closures and the variables they use are sent separately to the tasks running on the cluster, thus the variables created in the driver program are recieved in the tasks as a new copy, so updates on these copies are not propagated back to the driver.</p>
<p>Spark has two kinds of shared variables, <strong>accumulators</strong> and <strong>broadcast variables</strong>, to solve that problem as well as for solving issues related with the amount of data that is sent across the cluster.</p>
<h2>Accumulators</h2>
<p>Variables that can be used to aggregate values from worker nodes back to the driver program. In a nutshell:</p>
<ul>
<li>They are created with <em>SparkContext.accumulator(initialValue)</em> that returns an <em>org.apache.spark.Accumulator[T]</em> (with T, the type of initialValue)</li>
<li>Worker code adds values with <em>+=</em> in Scala or the function <em>add()</em> in Java.</li>
<li>The driver program can access with <em>value</em> in Scala or <em>value()</em>/<em>setValue()</em> in Java (accessing from worker code throws an exception)</li>
<li>The right value will be obtained after calling an <em><strong>action</strong></em> (remember that <u><em><strong>transformations</strong></em> are lazy operations</u>)</li>
</ul>
<p>Spark has built-in support for accumulators of type Integer, but you can create custom Accumulators by extending <a href="http://spark.apache.org/docs/1.2.1/api/java/index.html?org/apache/spark/AccumulatorParam.html" title="AccumulatorParam API" target="_blank">AccumulatorParam</a>.</p>
<p>Let&#8217;s see an example that counts the empty lines in the file that we use to count words:</p>
<p></p><pre class="crayon-plain-tag">public static void main(String[] args) {
		SparkConf conf = new SparkConf().setMaster(&quot;local&quot;).setAppName(&quot;Word Count with Spark&quot;);
		JavaSparkContext sc = new JavaSparkContext(conf);
		
		JavaRDD&lt;String&gt; lines = sc.textFile(INPUT_FILE_TEXT);
		
		final Accumulator&lt;Integer&gt; blankLines = sc.accumulator(0);
		
		@SuppressWarnings(&quot;resource&quot;)
		JavaPairRDD&lt;String, Integer&gt; counts = lines.flatMap(line -&gt; 
			{
				if (&quot;&quot;.equals(line)) {
					blankLines.add(1);
				}
				return Arrays.asList(line.split(&quot; &quot;));
			})
			.mapToPair(word -&gt; new Tuple2&lt;String, Integer&gt;(word, 1))
			.reduceByKey((x, y) -&gt; x + y);

		counts.saveAsTextFile(OUTPUT_FILE_TEXT);
		
		System.out.println(&quot;Blank lines: &quot; + blankLines.value());
		
		sc.close();
	}</pre><p></p>
<ul>
<li>In line 5 we create an Accumulator<Integer> initialized to 0</li>
<li>In lines 11 to 16 we modify the FlatMapFunction to add 1 if the input line is empty</li>
<li>In line 22 we print the value of the content. After the <em>saveAsTextFile()</em> action.</li>
</ul>
<p>Let&#8217;s try:</p>
<p></p><pre class="crayon-plain-tag">$ rm -r out
$ mvn clean install
$ ~/Applications/spark-1.2.1-bin-hadoop2.4/bin/spark-submit --class com.malsolo.spark.examples.WordCount target/spark-examples-0.0.1-SNAPSHOT.jar

Spark assembly has been built with Hive, including Datanucleus jars on classpath
15/03/02 15:26:22 WARN Utils: Your hostname, xxx resolves to a loopback address: 127.0.1.1; using yyy.yyy.y.yy instead (on interface eth0)
15/03/02 15:26:22 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
15/03/02 15:26:23 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Blank lines: 169
$</pre><p></p>
<h2>Broadcast variables</h2>
<p>Shared variable to efficiently distribute large read-only values to all the worker nodes.</p>
<p>If you need to use the same same variable in multiple parallel operations, it&#8217;s likely you’d rather share it instead of letting Spark sends it separately for each operation.</p>
<p>In a nutshell:</p>
<ul>
<li>They are created with SparkContext.broadcast(initValue) on an object of type T that has to be Serializable.</li>
<li>Access its value with <em>value</em> in Scala or <em>value()</em> in Java.</li>
<li>The value shouldn&#8217;t be modified after creation, because the change will only happen in one node.</li>
</ul>
<p>Let’s see an example with a list of words that have not to be included in the count (a short list, but enough to see the concept):</p>
<p></p><pre class="crayon-plain-tag">public static void main(String[] args) {
		SparkConf conf = new SparkConf().setMaster(&quot;local&quot;).setAppName(&quot;Word Count with Spark&quot;);
		JavaSparkContext sc = new JavaSparkContext(conf);
		
		JavaRDD&lt;String&gt; lines = sc.textFile(INPUT_FILE_TEXT);
		
		final Accumulator&lt;Integer&gt; blankLines = sc.accumulator(0);
		
		final Broadcast&lt;List&lt;String&gt;&gt; wordsToIgnore = sc.broadcast(getWordsToIgnore());
		
		@SuppressWarnings(&quot;resource&quot;)
		JavaPairRDD&lt;String, Integer&gt; counts = lines.flatMap(line -&gt; 
			{
				if (&quot;&quot;.equals(line)) {
					blankLines.add(1);
				}
				return Arrays.asList(line.split(&quot; &quot;));
			})
			.filter(word -&gt; !wordsToIgnore.value().contains(word))
			.mapToPair(word -&gt; new Tuple2&lt;String, Integer&gt;(word, 1))
			.reduceByKey((x, y) -&gt; x + y);
		
		counts.saveAsTextFile(OUTPUT_FILE_TEXT);
		
		System.out.println(&quot;Blank lines: &quot; + blankLines.value());
		
		sc.close();
	}

	private static List&lt;String&gt; getWordsToIgnore() {
		return Arrays.asList(&quot;the&quot;, &quot;of&quot;, &quot;and&quot;, &quot;for&quot;);
	}</pre><p></p>
<ul>
<li>In line 9 we create the broadcast variable: a list of words to ignore. In lines 30 to 31 we only return 4 words, but it&#8217;s easy to see that the list could be big enough.</li>
<li>In line 19 we access the broadcast variable with the <em>value()</em> method and use it in a filter method.</li>
</ul>
<h1>Resources</h1>
<ul>
<li>Source code at <a href="https://github.com/jbbarquero/spark-examples" title="spark-examples" target="_blank">GitHub</a></li>
<li><a title="Learning Spark" href="http://shop.oreilly.com/product/0636920028512.do" target="_blank">Learning Spark</a>. By Holden Karau, Andy Konwinski, Patrick Wendell, Matei Zaharia (O&#8217;Reilly Media)</li>
<li><a title="Cloudera Developer Training for Apache Spark" href="http://cloudera.com/content/cloudera/en/training/courses/spark-training.html" target="_blank">Cloudera Developer Training for Apache Spark</a>. By Diana Carroll (Cloudera training)</li>
<li><a title="Parallel Programming with Spark" href="https://www.youtube.com/watch?v=7k4yDKBYOcw" target="_blank">Parallel Programming with Spark (Part 1 &amp; 2)</a>. By Matei Zaharia ((UC Berkeley AMPLab YouTube channel))</li>
<li><a title="Advanced Spark Features" href="https://www.youtube.com/watch?v=w0Tisli7zn4" target="_blank">Advanced Spark Features</a>. By Matei Zaharia (UC Berkeley AMPLab YouTube channel)</li>
<li><a title="A Deeper Understanding of Spark Internals" href="https://www.youtube.com/watch?v=dmL0N3qfSc8" target="_blank">A Deeper Understanding of Spark Internals</a>. By Aaron Davidson (Apache Spark YouTube channel)</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://malsolo.com/blog4java/?feed=rss2&#038;p=679</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Getting started with Hadoop</title>
		<link>http://malsolo.com/blog4java/?p=516</link>
		<comments>http://malsolo.com/blog4java/?p=516#comments</comments>
		<pubDate>Wed, 25 Feb 2015 15:36:14 +0000</pubDate>
		<dc:creator><![CDATA[Javier (@jbbarquero)]]></dc:creator>
				<category><![CDATA[Big Data]]></category>
		<category><![CDATA[Architecture]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[Java]]></category>

		<guid isPermaLink="false">http://malsolo.com/blog4java/?p=516</guid>
		<description><![CDATA[Hadoop Introduction Hadoop is an open source framework for distributed fault-tolerant data storage and batch processing. It allows you to write applications for processing really huge data sets across clusters of computers using simple programming model with linear scalability on &#8230; <a href="http://malsolo.com/blog4java/?p=516">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<h1>Hadoop Introduction</h1>
<p>Hadoop is an open source framework for distributed fault-tolerant data storage and batch processing. It allows you to write applications for processing really huge data sets across clusters of computers using simple programming model with linear scalability on commodity hardware. Commodity hardware means cheaper hardware than the dedicated servers that are sold by many vendors. Linear scalability means that you have only to add more machines (nodes) to the Hadoop cluster.</p>
<p>The key concept for Hadoop is <strong><em>move-code-to-data</em></strong>, that is, data is distributed across the nodes of the Hadoop cluster and the applications (the jar files) are later sent to that nodes instead of vice versa (as in Java EE where applications are centralized in a application server and the data is collected to it over the network) in order to process the data locally. </p>
<p>At its core, Hadoop has two parts:</font></p>
<p>· <strong>Hadoop Distributed File System</strong> (<strong>HDFS™</strong>): a distributed file system that provides high-throughput access to application data.<br />
· <strong>YARN</strong> (<strong>Yet Another Resource Negotiator</strong>): a framework for job scheduling and cluster resource management.</p>
<p>As you can see in the very definition of the Apache Hadoop website (<a href="http://hadoop.apache.org/#What+Is+Apache+Hadoop%3Fhttp://" title="What is Apache Hadoop?" target="_blank">what is Apache Hadoop?</a>), Hadoop offers as a third component <strong>Hadoop MapReduce</strong>, a batch-based, distributed computing framework modeled after Google’s paper on MapReduce. It allows you to parallelize work over a large amount of raw data by splitting the input dataset into independent chunks which are processed by the map tasks (initial ingestion and transformation) in parallel, whose outputs are sorted and then passed to the reduce tasks (aggregation or summarization).</p>
<p>In the previous version of Hadoop (Hadoop 1), the implementation of MapReduce was based on a master <em>JobTracker</em>, for  resource management and job scheduling/monitoring, and per-node slaves called <em>TaskTracker</em> to launch/teardown tasks. But it had scalability problems, specially when you wanted very large clusters (more than 4,000 nodes).</p>
<p>So, MapReduce has undergone a complete overhaul and now is called MapReduce 2.0 (MRv2), but it is not a part by itself, currently, <u><strong>MapReduce</strong> is a YARN-based system</u>. That&#8217;s the reason why we can say that Hadoop has two main parts: HDFS and YARN.</p>
<h1>Hadoop ecosystem</h1>
<p>Besides the two core technologies, the distributed file system (HDFS) and Map Reduce (MR), there are a lot of projects that expand Hadoop with additional useful technologies, in such a way that we can consider all of them an ecosystem around Hadoop.</p>
<p>Next, a list of some of these projects, organized by some kind of categories:</p>
<ul>
<li><strong>Data Ingestion:</strong> to move data from and into HDFS
<ul>
<li><u>Flume</u>: a system for moving data into HDFS from remote systems using configurable memory-resident daemons that watch for data on those systems and then forward the data to Hadoop. For example, weblogs from multiple servers to HDFS.</li>
<li><u>Sqoop</u>: a tool for efficient bulk transfer of data between structured data stores (such as relational databases) and HDFS.</li>
</ul>
</li>
<li><strong>Data Processing:</strong>
<ul>
<li><u>Pig</u>: a procedural language for querying and data transform with scripts in a data flow language call PigLatin.</li>
<li><u>Hive</u>: a declarative SQL-like kanguage.</li>
<li><u>Spark</u>: an in-memory distributed data processing that breaks problems up over all of the Hadoop nodes, but keeps the data in memory for better performance and can be rebuilt with the details stored in the Resilient Distributed Dataset (RDD) from an external store (usually HDFS).</li>
<li><u>Storm</u>: a distributed real-time computation system for processing fast, large streams of data.</li>
</ul>
</li>
<li><strong>Data Formats:</strong>
<ul>
<li><u>Avro</u>: a language-neutral data serialization system. Expressed as JSON.</li>
<li><u>Parquet</u>: a compressed columnar storage format that can efficiently store nested data</li>
</ul>
</li>
<li><strong>Storage:</strong>
<ul>
<li><u>HBase</u>: a scalable, distributed database that supports structured data storage for large tables.</li>
<li><u>Accumulo</u>: a scalable, distributed database that supports structured data storage for large tables.</li>
</ul>
</li>
<li><strong>Coordination:</strong>
<ul>
<li><u>Zookeeper</u>: a high-performance coordination service for distributed applications.</li>
</ul>
</li>
<li><strong>Machine Learning:</strong>
<ul>
<li><u>Mahout</u>: a scalable machine learning and data mining library: classification, clustering, pattern mining, collaborative filtering and so on.</li>
</ul>
</li>
<li><strong>Workflow Management:</strong>
<ul>
<li><u>Oozie</u>: a service for running and scheduling workflows of Hadoop jobs (including Map-Reduce, Pig, Hive, and Sqoop jobs).</li>
</ul>
</li>
</ul>
<h1>Hadoop installation</h1>
<p>To install Hadoop on a single machine to try it out, just download the compressed file for the desired version and unpack it on the filesystem.</p>
<h2>Prerequisites</h2>
<p>There is some required software for running Apache Hadoop:</p>
<ul>
<li>Java. It&#8217;s also necessary to inform Hadoop where Java is via the environment variable JAVA_HOME</li>
<p></p><pre class="crayon-plain-tag">$ java -version
java version "1.8.0_31"
Java(TM) SE Runtime Environment (build 1.8.0_31-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.31-b07, mixed mode)
$ echo $JAVA_HOME
/usr/lib/jvm/java-8-oracle</pre><p></p>
<li>ssh: I have Ubuntu 14.04 that comes with ssh, but I had to manually install a server.</li>
<p></p><pre class="crayon-plain-tag">$ which ssh
/usr/bin/ssh
$ which sshd
/usr/sbin/sshd</pre><p></p>
<li>On Mac OSX, make sure <strong>Remote Login</strong> (under <strong>System Preferences</strong> -> <strong>File Sharing</strong>) is enabled for the current user or for all users.</li>
<li>On Windows, the best option is to follow the instructions at the Wiki: <a href="http://wiki.apache.org/hadoop/Hadoop2OnWindows" title="Build and Install Hadoop 2.x or newer on Windows" target="_blank">Build and Install Hadoop 2.x or newer on Windows</a>.</li>
</ul>
<h2>Download and install</h2>
<p>To get a Hadoop distribution, download a recent stable release from one of the <a href="http://www.apache.org/dyn/closer.cgi/hadoop/common/http://" title="Apache Download Mirrors" target="_blank">Apache Download Mirrors</a>.</p>
<p>There are several directories, for the current, last stable, last v1 stable version and so on. Basically, you&#8217;ll download a tar gzipped file named <strong>hadoop-x.y.z.tar.gz</strong>, for instance: hadoop-2.6.0.tar.gz.</p>
<p>You can unpack it wherever you want and then point the PATH to that directory. For example:</p>
<p></p><pre class="crayon-plain-tag">$ tar xzf hadoop-2.6.0.tar.gz
$
$ export HADOOP_HOME=~/Applications/hadoop-2.6.0
$ export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin</pre><p></p>
<p>Now you can verify the installation by typing <strong>hadoop version</strong>:</p>
<p></p><pre class="crayon-plain-tag">$ hadoop version
Hadoop 2.6.0
Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r e3496499ecb8d220fba99dc5ed4c99c8f9e33bb1
Compiled by jenkins on 2014-11-13T21:10Z
Compiled with protoc 2.5.0
From source with checksum 18e43357c8f927c0695f1e9522859d6a
This command was run using ~/Applications/hadoop-2.6.0/share/hadoop/common/hadoop-common-2.6.0.jar
$</pre><p></p>
<h2>Configuration</h2>
<p>Hadoop has three supported modes:</p>
<ul>
<li>Local (Standalone) Mode: a single Java process with daemons running. For development testing and debugging.</li>
<li>Pseudo-Distributed Mode: each Hadoop daemon runs in a separate Java process. For simulating a cluster on a small scale.</li>
<li>Fully-Distributed Mode: the Hadoop daemons run on a cluster of machines. If you want to take a look, see the oficial documentation: <a href="http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/ClusterSetup.html" title="Hadoop Cluster Setup" target="_blank">Hadoop MapReduce Next Generation &#8211; Cluster Setup</a>.</li>
</ul>
<p>In standalone mode, there is no further action to take, the default properties are enough and there are no daemons to run.</p>
<p>In pseudodistributed mode, you have to set up your computer as described at <a href="http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.htmlhttp://" title="Hadoop MapReduce Next Generation - Setting up a Single Node Cluster." target="_blank">Hadoop MapReduce Next Generation &#8211; Setting up a Single Node Cluster</a>. But let&#8217;s review the steps needed.</p>
<p>You need at least a minimum configuration with four files in <strong>HADOOP_HOME/etc/hadoop/</strong>:</p>
<ul>
<li><strong>core-site.xml</strong>. Common configuration, default values at <a href="http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/core-default.xml" title="core-default.xml" target="_blank">Configuration: core-default.xml</a></li>
<p></p><pre class="crayon-plain-tag">&lt;?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?&gt;
&lt;?xml-stylesheet type=&quot;text/xsl&quot; href=&quot;configuration.xsl&quot;?&gt;
&lt;configuration&gt;
	&lt;property&gt;
		&lt;name&gt;fs.defaultFS&lt;/name&gt;
		&lt;value&gt;hdfs://localhost:8020&lt;/value&gt;
	&lt;/property&gt;
&lt;/configuration&gt;</pre><p></p>
<p><em>fs.defaultFS</em> replaces the deprecated <em>fs.default.name</em> whose default value is <em>file://</em></p>
<li><strong>hdfs-site.xml</strong>. HDFS configuration, default values at <a href="http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml" title="hdfs-default.xml" target="_blank">Configuration: hdfs-default.xml</a></li>
<p></p><pre class="crayon-plain-tag">&lt;?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?&gt;
&lt;?xml-stylesheet type=&quot;text/xsl&quot; href=&quot;configuration.xsl&quot;?&gt;
&lt;configuration&gt;
	&lt;property&gt;
		&lt;name&gt;dfs.replication&lt;/name&gt;
		&lt;value&gt;1&lt;/value&gt;
	&lt;/property&gt;
&lt;/configuration&gt;</pre><p></p>
<p><em>dfs.replication</em>is the default block replication, unless other is specified at creation time. The default value is 3, but we use 1 because we have only one node.</p>
<p>Other useful values are:</p>
<p><em>dfs.namenode.name.dir</em>, local path for storing the fsimage by the NN (defaults to file://${hadoop.tmp.dir}/dfs/name with hadoop.tmp.dir configurable at core-site.xml with default value /tmp/hadoop-${user.name})</p>
<p><em>dfs.datanode.data.dir</em>, local path for storing blocks by the DN (defaults to file://${hadoop.tmp.dir}/dfs/data)</p>
<li><strong>mapred-site.xml</strong>. MapReduce configuration, default values at <a href="http://hadoop.apache.org/docs/current/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml" title="mapred-default.xml" target="_blank">Configuration: mapred-default.xml</a></li>
<p></p><pre class="crayon-plain-tag">&lt;?xml version=&quot;1.0&quot;?&gt;
&lt;?xml-stylesheet type=&quot;text/xsl&quot; href=&quot;configuration.xsl&quot;?&gt;
&lt;configuration&gt;
	&lt;property&gt;
		&lt;name&gt;mapreduce.framework.name&lt;/name&gt;
		&lt;value&gt;yarn&lt;/value&gt;
	&lt;/property&gt;
&lt;/configuration&gt;</pre><p></p>
<p><em>mapreduce.framework.name</em>, the runtime framework for executing MapReduce jobs: local, classic or yarn.</p>
<p>Other useful values are:</p>
<p><em>mapreduce.jobtracker.system.dir</em>, the directory where MapReduce stores control files (defaults to ${hadoop.tmp.dir}/mapred/system).</p>
<p><em>mapreduce.cluster.local.dir</em>, the local directory where MapReduce stores intermediate data files (defaults to ${hadoop.tmp.dir}/mapred/local)</p>
<li><strong>yarn-site.xml</strong>. YARN configuration, default values at <a href="http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-common/yarn-default.xml" title="yarn-default.xml" target="_blank">Configuration: yarn-default.xml</a></li>
</ul>
<p></p><pre class="crayon-plain-tag">&lt;?xml version=&quot;1.0&quot;?&gt;
&lt;configuration&gt;
	&lt;property&gt;
		&lt;name&gt;yarn.resourcemanager.hostname&lt;/name&gt;
		&lt;value&gt;localhost&lt;/value&gt;
	&lt;/property&gt;
	&lt;property&gt;
		&lt;name&gt;yarn.nodemanager.aux-services&lt;/name&gt;
		&lt;value&gt;mapreduce_shuffle&lt;/value&gt;
	&lt;/property&gt;
&lt;/configuration&gt;</pre><p></p>
<p><em>yarn.resourcemanager.hostname</em>, the host name of the Resource Manager.</p>
<p><em>yarn.nodemanager.aux-services</em>, list of auxiliary services executed by the Node Manager. The value mapreduce_shuffle is for the Suffle/Sort in MapReduce that is an auxilary service in Hadoop 2.x. </p>
<h2>Configuring SSH</h2>
<p>Pseudodistributed mode is like fully distributed mode with a single host: localhost. In order to start the daemons on the set of hosts in the cluster, SSH is used. So we&#8217;ll configure SSH to log in without password.</p>
<p>Remember that you need to have SSH installed and a server running. On Ubuntu, try this if you need so:</p>
<p></p><pre class="crayon-plain-tag">$ sudo apt-get install ssh</pre><p></p>
<p>Now create a SSH key with an empty passprhase</p>
<p></p><pre class="crayon-plain-tag">$ ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa_hadoop
$ cat ~/.ssh/id_rsa_hadoop.pub >> ~/.ssh/authorized_keys</pre><p></p>
<p>Finally, test if you can connect without a pasword by trying:</p>
<p></p><pre class="crayon-plain-tag">$ ssh localhost</pre><p></p>
<h2>First steps with HDFS</h2>
<p>Before using HDFS for the first time, some steps must be performed:</p>
<h3>Formatting the HDFS filesystem</h3>
<p>Just run the following command:</p>
<p></p><pre class="crayon-plain-tag">$ hdfs namenode -format</pre><p></p>
<h3>Starting the daemons</h3>
<p>To start the HDFS, YARN, and MapReduce daemons, type:</p>
<p></p><pre class="crayon-plain-tag">$ start-dfs.sh
$ start-yarn.sh
$ mr-jobhistory-daemon.sh start historyserver</pre><p></p>
<p>You can check what processes are running with the Java&#8217;s <strong>jps</strong> command:</p>
<p></p><pre class="crayon-plain-tag">$ jps
25648 NodeManager
25521 ResourceManager
25988 JobHistoryServer
25355 SecondaryNameNode
25180 DataNode
26025 Jps
$</pre><p></p>
<h3>Stopping the daemons</h3>
<p>Once it&#8217;s over, you can stop the daemons with:</p>
<p></p><pre class="crayon-plain-tag">$ mr-jobhistory-daemon.sh stop historyserver
$ stop-yarn.sh
$ stop-dfs.sh</pre><p></p>
<h3>Creating A User Directory</h3>
<p>You can create a home directory for a user with the next command:</p>
<p></p><pre class="crayon-plain-tag">$ hadoop fs -mkdir -p ~/Documents/hadoop-home/</pre><p></p>
<h1>Other Hadoop installations</h1>
<p>There are another way to get installed Hadoop, that is, using companies that provide products that include Apache Hadoop or some kind of derivatives:</p>
<ul>
<li><a href="http://aws.amazon.com/elasticmapreduce/" title="Amazon EMR" target="_blank">Amazon Elastic MapReduce (Amazon EMR)</a></li>
<li><a href="http://www.cloudera.com/content/cloudera/en/downloads.html" title="CDH" target="_blank">Cloudera&#8217;s Distribution including Apache Hadoop (CDH)</a></li>
<li><a href="http://hortonworks.com/hdp/" title="HDP" target="_blank">Hortonworks Data Platform Powered by Apache Hadoop (HDP)</a></li>
<li><a href="https://www.mapr.com/" title="MapR" target="_blank">MapR Technologies</a></li>
<li><a href="http://pivotal.io/big-data/pivotal-hd" title="Pivotal HD" target="_blank">Pivotal HD</a></li>
</ul>
<h1>Hadoop Distributed File System (HDFS)</h1>
<p>The HDFS filesystem designed for distributed storage of very large files (hundreds of megabytes, gigabytes, or terabytes in size) and distributed processing using commodity hardware. It is a hierarchical UNIX-like file system, but internally it splits large files into blocks (with size from 32MB to 128MB, being 64MB the default), in order to perform a distribution and a replication of these blocks among the nodes of the Hadoop cluster.The applications that use HDFS usually write data once and read data many times.</p>
<p>The HDFS has two types of nodes:</p>
<ul>
<li>The master <strong>NameNode</strong> (NN), that stores the filesystem tree and the metadata for locating the files and directories in the tree that are actually located in the DataNodes. It stores this information in memory, however, to ensure against data loss, it&#8217;s also saved to disk using two files: the namespace image and the edit log.
<ul>
<li>fsimage: a point in time snapshot of what HDFS looks like.</li>
<li>edit log: the deltas or changes to HDFS since the last snapshot.</li>
</ul>
<p>Both are prediodically merged.</p>
</li>
<li>The <strong>DataNode</strong>s (DN), that are responsible for serving the actual file data (once the client knows which one to use after contacting the NameNode). They also sends heatbeats every 3 seconds (by default) to the NN and block reports every 1 hour (by default) to the DN both for maintenance purposes.</li>
</ul>
<p>There is also a node poorly named <strong>Secondary NameNode</strong> that is not a failover node nor a backup node, it periodically merges the namespace image with the edit log to prevent the edit log from becoming too large. Thus, the best name for it is <strong>Checkpoint Node</strong>.</p>
<h2>The Command-Line Interface</h2>
<p>Once you have installed Hadoop, you can interact with HDFS, as well as other file systems that Hadoop supports (local filesystem, HFTP FS, S3 FS, and others), using the command line. The FS shell is invoked by:</p>
<p></p><pre class="crayon-plain-tag">$ hadoop fs <args></pre><p></p>
<p>Provided to you have hadoop in the PATH as we saw above.</p>
<p>You can find a list of available commands at <a href="http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html" title="FS shell" target="_blank">File System Shell</a>.</p>
<p>You can perform operations like:</p>
<ul>
<li><a href="http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html#copyFromLocal" title="copyFromLocal" target="_blank">copyFromLocal</a> (putting files into HDFS)</li>
<li><a href="http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html#copyToLocal" title="copyToLocal" target="_blank">copyToLocal</a> (getting files form HDFS)</li>
<li><a href="http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html#mkdir" title="mkdir" target="_blank">mkdir</a> (creating directories in HDFS)</li>
<li><a href="http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html#ls" title="ls" target="_blank">ls</a> (list files in HDFS)</li>
</ul>
<h2>Data exchange with HDFS</h2>
<p>Hadoop is mainly written in Java, being the core class for HDFS the abstract class <a href="https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileSystem.java" title="fs.FileSystem" target="_blank">org.apache.hadoop.fs.FileSystem</a>, that represents a filesystem in Hadoop. The several concrete subclasses provide implementations from local filesystem (<a href="https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/LocalFileSystem.java" title="fs.LocalFileSystem" target="_blank">fs.LocalFileSystem</a>) to HDFS (<a href="https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java" title="hdfs.DistributedFileSystem" target="_blank">hdfs.DistributedFileSystem</a>), or Amazon S3 (<a href="https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3native/NativeS3FileSystem.java" title="fs.s3native.NativeS3FileSystem" target="_blank">fs.s3native.NativeS3FileSystem</a>) and many more (read-only HTTP, FTP server, &#8230;)</p>
<h3>Reading data</h3>
<p>Reading data using the Java API involves to obtain the abstract <a href="https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileSystem.java" title="fs.FileSystem" target="_blank">FileSystem</a> via one of the factory methods (<strong><em>get()</em></strong>), or the convenient method for retrieving the local filesystem (<strong><em>getLocal()</em></strong>):</p>
<p><strong>public static FileSystem get(Configuration conf) throws IOException<br />
public static FileSystem get(URI uri, Configuration conf) throws IOException<br />
public static FileSystem get(URI uri, Configuration conf, String user)<br />
throws IOException<br />
public static LocalFileSystem getLocal(Configuration conf) throws IOException</strong></p>
<p>And then obtain an input stream for a file (that can be later be closed):</p>
<p><strong>public FSDataInputStream open(Path f) throws IOException<br />
public abstract FSDataInputStream open(Path f, int bufferSize) throws IOException</strong></p>
<p>With these methods on hand, the flow of the data in HDFS will be the next:</p>
<div id="attachment_601" style="width: 650px" class="wp-caption alignnone"><a href="http://malsolo.com/blog4java/wp-content/uploads/2014/11/HDFS_Client_Read_File.jpg"><img src="http://malsolo.com/blog4java/wp-content/uploads/2014/11/HDFS_Client_Read_File.jpg" alt="HDFS read" width="640" height="367" class="size-full wp-image-601" /></a><p class="wp-caption-text">Reading from HDFS</p></div>
<ol>
<li>The client calls the <strong>open()</strong> method to read a file. A <strong>DistributedFileSystem</strong> is returned.</li>
<li>The DistributedFileSystem asks the <strong>NameNode</strong> for the block locations. The NameNode returns an ordered list of the DataNodes that have a copy of the block (sorted by proximity to the client). The DistributedFileSystem returns a <strong>FSDataInputStream</strong> to the client for it to read data from</li>
<li>The client calls <strong>read()</strong> on the input stream</li>
<li>The FSDataInputStream reads data for the client from the DataNode until there is no more data in that node.</li>
<li>The FSDataInputStream will manage the closing and opening connection to DataNodes for serving data to the client in a transparently way. It also manages validation (checksum) and errors (by trying to read data from a replica)</li>
<li>When the client has finished reading, it calls <strong>close()</strong> on the FSDataInputStream</li>
</ol>
<h3>Writing data</h3>
<p>The Java API allows you to create files with create methods (that, by the way, also create any parent directories of the file that don&#8217;t already exist). The API also includes a <strong>Progressable</strong> interface to be notified of the process of the data being written to the datanodes. It&#8217;s also possible to append data to an existing file, but this functionality is optional (S3 doesn&#8217;t support it from the time being)</p>
<p><strong>public FSDataOutputStream create(Path f) throws IOException<br />
public FSDataOutputStream append(Path f) throws IOException</strong></p>
<p>The output stream will be used for writing the data. Furthermore, the FSDataOutputStream can inform of the current position in the file.</p>
<p>The flow of the data written to HDFS with these methods is the next:</p>
<div id="attachment_602" style="width: 650px" class="wp-caption alignnone"><a href="http://malsolo.com/blog4java/wp-content/uploads/2014/11/HDFS_Client_Write_File.png"><img src="http://malsolo.com/blog4java/wp-content/uploads/2014/11/HDFS_Client_Write_File.png" alt="HDFS write" width="640" height="416" class="size-full wp-image-602" /></a><p class="wp-caption-text">Writing to HDFS</p></div>
<ol>
<li>The client creates the file by calling <strong>create()</strong> on <strong>DistributedFileSystem</strong>.</li>
<li>DistributedFileSystem makes an RPC call to the namenode to create a<br />
new file in the filesystem’s namespace, with no blocks associated with it. The NameNode checks file existence and permission, throwing IOException if theres is any problem, otherwise, it returns a FSDataOutputStream for writing data to.</li>
<li>The data written by the client is splitted into packets that are sent to a <em>data queue</em>.</li>
<li>The data queue is consumed by the Data Streamer which streams the packets to a pipeline of DataNodes (one per replication factor). Each DataNode stores the packet and send it to the next DataNode in the pipeline.</li>
<li>There is another queue, ack queue, that contains the packets that are waiting for acknowledged by all the datanodes in the pipeline. If a DataNode fails in a write operation, the pipeline will be re-arranged transparently for the client.</li>
<li>When the client has finished writing data, it calls <strong>close()</strong> on the stream.</li>
<li>The remaining packets are flushed and, after receiving all the acknowledgments, the NameNode is notified that the write to the file is completed.</li>
</ol>
<h1>Apache YARN (Yet Another Resource Negotiator)</h1>
<p>YARN is Hadoop’s cluster resource management system. It provides provides APIs for requesting and working with cluster resources to be used not by user code, but for higher level APIs, like MapReduce v2, Spark, Tez&#8230;</p>
<p>YARN separates resource management and job scheduling/monitoring into separate daemons. In Hadoop 1.x these two functions were performed by the JobScheduler, that implies a bottleneck for scaling the Hadoop nodes in the cluster.</p>
<h2>YARN Components</h2>
<p>There are five major component types in a YARN cluster:</p>
<ul>
<li><strong>Resource Manager (RM)</strong>: a global per-cluster daemon that is solely responsible for allocating and managing resources available within the cluster.</li>
<li><strong>Node Manager (NM)</strong>: a per-node daemon that is responsible for creating, monitoring, and killing containers.</li>
<li><strong>Application Master (AM)</strong>: This is a per-application daemon whose duty is the negotiation of resources from the ResourceManager and to work with the NodeManager(s) to execute and monitor the tasks.</li>
<li><strong>Container</strong>: This is an abstract representation of a resource set that is given to a particular application:  memory and cpu. It&#8217;s a computational unit (one node runs several containers, but a container cannot cross a node boundary). The AM is a specialized container that is used to bootstrap and manage the entire application&#8217;s life cycle.</li>
<li><strong>Application Client</strong>: it submits applications to the RM and it specifies the type of AM needed to execute the application (for instance, MapReduce).</li>
</ul>
<h2>Anatomy of a YARN Request</h2>
<p>These are the steps involved in the submission of a job to the YARN framework.</p>
<div id="attachment_649" style="width: 632px" class="wp-caption alignnone"><a href="http://malsolo.com/blog4java/wp-content/uploads/2015/02/yarn_architecture.gif"><img src="http://malsolo.com/blog4java/wp-content/uploads/2015/02/yarn_architecture.gif" alt="Anatomy of a YARN Request" width="622" height="385" class="size-full wp-image-649" /></a><p class="wp-caption-text">YARN architecture</p></div>
<ol>
<li>The client submits a job to the RM asking to run an AM process (Job Submission in the picture above).</li>
<li>The RM looks for resources to acquire a container on a node to launch an instance of the AM.</li>
<li>The AM registers with the RM to enable the client to query the RM for details about the AM.</li>
<li>Now the AM is running, and it could run the computation returning the result to the client, or it could request more containers to the RM to run a distributed computation (Resource Request in the picture above)</li>
<li>The application code executing in the launched container (tasks) reports its status to the AM through an application-specific protocol (MapReduce status in the picture above, that it&#8217;s assuming that the YARN application being executed is MapReduce).</li>
<li>Once the application completes execution, the AM deregisters with the RM, and the containers used are released back to the system.</li>
</ol>
<p>This process applies for each client that submits jobs. In the picture above there are two clients (the red one and the blue one)</p>
<h1>Hadoop first program: WordCount MapReduce</h1>
<p>MapReduce is a paradigm for data processing that uses two key phases:</p>
<ol>
<li><strong>Map</strong>: it performs a transformation on input key-value pairs to generate intermediate key-value pairs.</li>
<li><strong>Reduce</strong>: it performs a summarize function on intermediate key-value groups to generate the final output of key-value pairs.</li>
<li>The groups that are the input of the Reduce phase are created by sorting the output of the Map phase in an operation called as <strong>Short/Shuffle</strong> (in YARN, is an auxiliary service)
</ol>
<h2>Writing the program</h2>
<p>For writing a MapReduce program in Java for running it in Hadoop you need to provide a Mapper class, a Reducer class, and a driver program to run a job.</p>
<p>Let&#8217;s begin with <u>the Mapper class</u>, it will separate each word with a count of 1:</p>
<p></p><pre class="crayon-plain-tag">public class WordCountMapper extends Mapper&lt;LongWritable, Text, Text, IntWritable&gt; {
	
	private final static IntWritable ONE = new IntWritable(1);
	private Text word = new Text();
	
	@Override
	protected void map(LongWritable key, Text value,
			Mapper&lt;LongWritable, Text, Text, IntWritable&gt;.Context context)
			throws IOException, InterruptedException {
		String line = value.toString();
		StringTokenizer tokenizer = new StringTokenizer(line);
		while (tokenizer.hasMoreTokens()) {
			word.set(tokenizer.nextToken());
			context.write(word, ONE);
		}
	}

}</pre><p></p>
<p>Highlights here are the parameters of the Mapper class, in this case:</p>
<ol>
<li>The input key, a long that will be ignored</li>
<li>The input value, a line of text</li>
<li>The output key, the word to be counted</li>
<li>The output value, the count for the word, always one, as we said before.</li>
</ol>
<p>As you can see, instead of using Java types, it&#8217;s better to use Hadoop basic types that are optimized for network serialization (available in the <em>org.apache.hadoop.io</em> package)</p>
<p>The basic approach is to override the <em>map()</em> method and make use of the key and value input parameters, as well as the instance of a Context to write the output to: the words with its count (one, for the moment being)</p>
<p>Let&#8217;s continue with <u>the Reducer class</u>.</p>
<p></p><pre class="crayon-plain-tag">public class WordCountReducer extends Reducer&lt;Text, IntWritable, Text, IntWritable&gt; {
	@Override
	protected void reduce(Text key, Iterable&lt;IntWritable&gt; values,
			Reducer&lt;Text, IntWritable, Text, IntWritable&gt;.Context context)
			throws IOException, InterruptedException {
		int sum = 0;
		for (IntWritable val : values) {
			sum += val.get();
		}
		context.write(key, new IntWritable(sum));
	}
	
}</pre><p></p>
<p>The intermediate result from the Mapper will be partitioned by MapReduce in such a way that the same reducer will receive all output records containing the same key. MapReduce will also sort all the map output keys and will call each reducer only once for each output key along with a list of all the output values for this key.</p>
<p>Thus, to write a Mapper class, you override the method reduce that has as parameters the only key, the list of values as an iterable and an instance of the Context to write the final result to.</p>
<p>In our case, the reducer will sum the count that each words carry (always one) and it will write the result to the context.</p>
<p>Finally, <u>the Driver class</u>, the class that runs the MapReduce job.</p>
<p></p><pre class="crayon-plain-tag">public class WordCountDriver {
	
	public static void main(String[] args) throws Exception {
		Configuration conf = new Configuration();
		String[] myArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
		if (myArgs.length != 2) {
			System.err.println(&quot;Usage: WordCountDriver &lt;input path&gt; &lt;output path&gt;&quot;);
			System.exit(-1);
		}
		Job job = Job.getInstance(conf, &quot;Classic WordCount&quot;);
		job.setJarByClass(WordCountDriver.class);
		
		FileInputFormat.addInputPath(job, new Path(myArgs[0]));
		FileOutputFormat.setOutputPath(job, new Path(myArgs[1]));
		
		job.setMapperClass(WordCountMapper.class);
		job.setReducerClass(WordCountReducer.class);
		
		//job.setMapOutputKeyClass(Text.class);
		//job.setMapOutputValueClass(IntWritable.class);
		job.setOutputKeyClass(Text.class);
		job.setOutputValueClass(IntWritable.class);
		
		System.exit(job.waitForCompletion(true) ? 0 : 1);
	}
}</pre><p></p>
<p>First, create a Hadoop Configuration (the default values are enough for this example) and use the <em>GenericOptionsParser</em> class to parse only the generic Hadoop arguments. </p>
<p>To configure, submit and control the execution of the job, as well as to monitor its progress, use a <em>Job</em> object. Take care of configuring it (via its <em>set()</em> methods) before submitting the job or an <em>IllegalStateException</em> will be thrown.</p>
<p>In a Hadoop cluster, the JAR package will be distributed around the cluster, to allow Hadoop to locate this JAR we pass a class in the Job ’s <em>setJarByClass()</em> method.</p>
<p>Next, we specify the input and output paths by calling the static <em>addInputPath()</em> (or <em>setInputPaths</em>) method on <em>FileInputFormat()</em> (with a file, directory or file pattern) and the static <em>setOutputPath()</em> method on <em>FileOutputFormat</em> (with a non-existing directory, in order to avoid data loss from another job) respectively.</p>
<p>Then, the job is configured with the Mapper class and the Reducer class.</p>
<p>There is no need for specifying the map output types because they are the same than the ones produced by the Reducer class, but we need to indicate the output types for the reduce function.</p>
<p>Finally, the <em>waitForCompletion()</em> method on Job submits the job and waits for it to finish. The argument is a flag for verbosity in the generated output. The return value indicates success (true) or failure (false). We use it for the the program’s exit code (0 or 1).</p>
<h2>Running the program</h2>
<p>The source code is available at <a href="https://github.com/jbbarquero/mapreduce" title="Mapreduce first program" target="_blank">github</a>. You can download it, go to the directory and just run the next commands:</p>
<p></p><pre class="crayon-plain-tag">$ mvn clean install
$ export HADOOP_CLASSPATH=target/mapreduce-0.0.1-SNAPSHOT.jar
$ hadoop com.malsolo.hadoop.mapreduce.WordCountDriver data/the_constitution_of_the_united_states.txt out</pre><p></p>
<p>You will see something like this:</p>
<p></p><pre class="crayon-plain-tag">15/02/25 15:30:47 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
15/02/25 15:30:47 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
15/02/25 15:30:47 INFO input.FileInputFormat: Total input paths to process : 1
15/02/25 15:30:47 INFO mapreduce.JobSubmitter: number of splits:1
15/02/25 15:30:47 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local284822998_0001
15/02/25 15:30:47 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
15/02/25 15:30:47 INFO mapreduce.Job: Running job: job_local284822998_0001
15/02/25 15:30:47 INFO mapred.LocalJobRunner: OutputCommitter set in config null
15/02/25 15:30:47 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
15/02/25 15:30:48 INFO mapred.LocalJobRunner: Waiting for map tasks
15/02/25 15:30:48 INFO mapred.LocalJobRunner: Starting task: attempt_local284822998_0001_m_000000_0
15/02/25 15:30:48 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
15/02/25 15:30:48 INFO mapred.MapTask: Processing split: file:.../mapreduce/data/the_constitution_of_the_united_states.txt:0+45119
15/02/25 15:30:48 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
15/02/25 15:30:48 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
15/02/25 15:30:48 INFO mapred.MapTask: soft limit at 83886080
15/02/25 15:30:48 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
15/02/25 15:30:48 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
15/02/25 15:30:48 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
15/02/25 15:30:48 INFO mapred.LocalJobRunner: 
15/02/25 15:30:48 INFO mapred.MapTask: Starting flush of map output
15/02/25 15:30:48 INFO mapred.MapTask: Spilling map output
15/02/25 15:30:48 INFO mapred.MapTask: bufstart = 0; bufend = 75556; bufvoid = 104857600
15/02/25 15:30:48 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 26183792(104735168); length = 30605/6553600
15/02/25 15:30:48 INFO mapred.MapTask: Finished spill 0
15/02/25 15:30:48 INFO mapred.Task: Task:attempt_local284822998_0001_m_000000_0 is done. And is in the process of committing
15/02/25 15:30:48 INFO mapred.LocalJobRunner: map
15/02/25 15:30:48 INFO mapred.Task: Task 'attempt_local284822998_0001_m_000000_0' done.
15/02/25 15:30:48 INFO mapred.LocalJobRunner: Finishing task: attempt_local284822998_0001_m_000000_0
15/02/25 15:30:48 INFO mapred.LocalJobRunner: map task executor complete.
15/02/25 15:30:48 INFO mapred.LocalJobRunner: Waiting for reduce tasks
15/02/25 15:30:48 INFO mapred.LocalJobRunner: Starting task: attempt_local284822998_0001_r_000000_0
15/02/25 15:30:48 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
15/02/25 15:30:48 INFO mapred.ReduceTask: Using ShuffleConsumerPlugin: org.apache.hadoop.mapreduce.task.reduce.Shuffle@1bd34bf7
15/02/25 15:30:48 INFO reduce.MergeManagerImpl: MergerManager: memoryLimit=334338464, maxSingleShuffleLimit=83584616, mergeThreshold=220663392, ioSortFactor=10, memToMemMergeOutputsThreshold=10
15/02/25 15:30:48 INFO reduce.EventFetcher: attempt_local284822998_0001_r_000000_0 Thread started: EventFetcher for fetching Map Completion Events
15/02/25 15:30:48 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local284822998_0001_m_000000_0 decomp: 90862 len: 90866 to MEMORY
15/02/25 15:30:48 INFO reduce.InMemoryMapOutput: Read 90862 bytes from map-output for attempt_local284822998_0001_m_000000_0
15/02/25 15:30:48 INFO reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 90862, inMemoryMapOutputs.size() -> 1, commitMemory -> 0, usedMemory ->90862
15/02/25 15:30:48 INFO reduce.EventFetcher: EventFetcher is interrupted.. Returning
15/02/25 15:30:48 INFO mapred.LocalJobRunner: 1 / 1 copied.
15/02/25 15:30:48 INFO reduce.MergeManagerImpl: finalMerge called with 1 in-memory map-outputs and 0 on-disk map-outputs
15/02/25 15:30:48 INFO mapred.Merger: Merging 1 sorted segments
15/02/25 15:30:48 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 90857 bytes
15/02/25 15:30:48 INFO reduce.MergeManagerImpl: Merged 1 segments, 90862 bytes to disk to satisfy reduce memory limit
15/02/25 15:30:48 INFO reduce.MergeManagerImpl: Merging 1 files, 90866 bytes from disk
15/02/25 15:30:48 INFO reduce.MergeManagerImpl: Merging 0 segments, 0 bytes from memory into reduce
15/02/25 15:30:48 INFO mapred.Merger: Merging 1 sorted segments
15/02/25 15:30:48 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 90857 bytes
15/02/25 15:30:48 INFO mapred.LocalJobRunner: 1 / 1 copied.
15/02/25 15:30:48 INFO Configuration.deprecation: mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords
15/02/25 15:30:48 INFO mapred.Task: Task:attempt_local284822998_0001_r_000000_0 is done. And is in the process of committing
15/02/25 15:30:48 INFO mapred.LocalJobRunner: 1 / 1 copied.
15/02/25 15:30:48 INFO mapred.Task: Task attempt_local284822998_0001_r_000000_0 is allowed to commit now
15/02/25 15:30:48 INFO output.FileOutputCommitter: Saved output of task 'attempt_local284822998_0001_r_000000_0' to file:.../out/_temporary/0/task_local284822998_0001_r_000000
15/02/25 15:30:48 INFO mapred.LocalJobRunner: reduce > reduce
15/02/25 15:30:48 INFO mapred.Task: Task 'attempt_local284822998_0001_r_000000_0' done.
15/02/25 15:30:48 INFO mapred.LocalJobRunner: Finishing task: attempt_local284822998_0001_r_000000_0
15/02/25 15:30:48 INFO mapred.LocalJobRunner: reduce task executor complete.
15/02/25 15:30:48 INFO mapreduce.Job: Job job_local284822998_0001 running in uber mode : false
15/02/25 15:30:48 INFO mapreduce.Job:  map 100% reduce 100%
15/02/25 15:30:48 INFO mapreduce.Job: Job job_local284822998_0001 completed successfully
15/02/25 15:30:48 INFO mapreduce.Job: Counters: 33
	File System Counters
		FILE: Number of bytes read=283490
		FILE: Number of bytes written=809011
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
	Map-Reduce Framework
		Map input records=872
		Map output records=7652
		Map output bytes=75556
		Map output materialized bytes=90866
		Input split bytes=175
		Combine input records=0
		Combine output records=0
		Reduce input groups=1697
		Reduce shuffle bytes=90866
		Reduce input records=7652
		Reduce output records=1697
		Spilled Records=15304
		Shuffled Maps =1
		Failed Shuffles=0
		Merged Map outputs=1
		GC time elapsed (ms)=8
		CPU time spent (ms)=0
		Physical memory (bytes) snapshot=0
		Virtual memory (bytes) snapshot=0
		Total committed heap usage (bytes)=525336576
	Shuffle Errors
		BAD_ID=0
		CONNECTION=0
		IO_ERROR=0
		WRONG_LENGTH=0
		WRONG_MAP=0
		WRONG_REDUCE=0
	File Input Format Counters 
		Bytes Read=45119
	File Output Format Counters 
		Bytes Written=17405
$</pre><p></p>
<p>And you can take a look to the result using:</p>
<p></p><pre class="crayon-plain-tag">$ sort -k2 -h -r out/part-r-00000 | head -20
the	663
of	494
shall	293
and	256
to	183
be	178
or	157
in	139
by	101
a	94
United	85
for	81
any	79
President	72
The	64
have	64
as	64
States,	55
such	52
State	47
$</pre><p></p>
<p>Regarding this example, I have to mention a couple of things:</p>
<ol>
<li>
	The code includes a data directory containing a text file (yes, the constitution of the USA)</p>
<ul>
<li>Yes, the program need some improvements for not considering the commas and something like that.</li>
<li>It&#8217;s funny to see the most repeated words (the, of, shall, and, to, be, or, in, by, a) and the most important words (United, States, President)</li>
</ul>
</li>
<li>Due to a problem with <a href="http://wiki.apache.org/hadoop/HadoopIPv6" title="Hadoop and IPv6" target="_blank">Hadoop and IPv6</a>, it doesn&#8217;t work with pseudistributed mode due to a connection exception (<font color="red"><em>java.net.ConnectException: Call [&#8230;] to localhost:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused</em></font>). For this example is enough to use local mode (just recover the original core-site.xml, hdfs-site.xml, mapred-site.xml and yarn-site.xml and you can also stop the HDFS, YARN, and MapReduce daemons. See above)</li>
</ol>
<h1>Resources</h1>
<ul>
<li><a href="http://shop.oreilly.com/product/0636920033448.do" title="Hadoop: The Definitive Guide" target="_blank">Hadoop: The Definitive Guide, 4th Edition</a>. By Tom White (O&#8217;Reilly Media)</li>
<li><a href="http://www.apress.com/9781430248637?gtmf=s" title="Pro Apache Hadoop" target="_blank">Pro Apache Hadoop, 2nd Edition</a>. By Sameer Wadkar, Madhu Siddalingaiah, Jason Venner (Apress)</li>
<li><a href="https://www.packtpub.com/big-data-and-business-intelligence/mastering-hadoop" title="Mastering Hadoop" target="_blank">Mastering Hadoop</a>. By Sandeep Karanth (Packt publishing)</li>
<li><a href="http://www.manning.com/holmes2/" title="Hadoop in Practice, Second Edition" target="_blank">Hadoop in Practice, Second Edition</a>. By Alex Holmes (Manning publications)</li>
<li><a href="http://www.manning.com/lam2/" title="Hadoop in Action, Second Edition" target="_blank">Hadoop in Action, Second Edition</a>. By Chuck P. Lam and Mark W. Davis (Manning publications)</li>
<li><a href="https://www.youtube.com/watch?v=xYnS9PQRXTg" title="Hadoop - Just the Basics for Big Data Rookies" target="_blank">Hadoop &#8211; Just the Basics for Big Data Rookies</a>. By  Adam Shook (SpringDeveloper YouTube channel)</li>
<li><a href="https://www.youtube.com/watch?v=tIPA6vMZomQ" title="Getting started with Spring Data and Apache Hadoop" target="_blank">Getting started with Spring Data and Apache Hadoop</a>. By Thomas Risberg, Janne Valkealahti (SpringDeveloper YouTube channel)</li>
<li><a href="https://www.youtube.com/watch?v=IcuTdJgUFmo" title="Hadoop 201 -- Deeper into the Elephant" target="_blank">Hadoop 201 &#8212; Deeper into the Elephant</a>. By Roman Shaposhnik (SpringDeveloper YouTube channel)</li>
<li><a href="http://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-common/SingleCluster.html" title="Setting up a Single Node Cluster" target="_blank">Hadoop MapReduce Next Generation &#8211; Setting up a Single Node Cluster</a>.</li>
<li><a href="http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html" title="FileSystemShell" target="_blank">The File System (FS) shell</a></li>
<li><a href="http://www.adictosaltrabajo.com/tutoriales/tutoriales.php?pagina=mapreduce_basic" title="Primeros pasos con Hadoop: instalación y configuración en Linux" target="_blank">Primeros pasos con Hadoop: instalación y configuración en Linux</a>. By Juan Alonso Ramos (Adictos al trabajo)</li>
<li><a href="http://blog.cloudera.com/blog/2014/06/how-to-install-a-virtual-apache-hadoop-cluster-with-vagrant-and-cloudera-manager/" title="Hadoop virtual cluster: Vagrant and CDH" target="_blank">How-to: Install a Virtual Apache Hadoop Cluster with Vagrant and Cloudera Manager</a>. By Justin Kestelyn (@kestelyn Cloudera blog()</li>
<li><a href="http://hortonworks.com/blog/building-hadoop-vm-quickly-ambari-vagrant/" title="Hadoop VM: Ambari and Vagrant, HDP" target="_blank">How to build a Hadoop VM with Ambari and Vagran</a>t. By Saptak Sen (Hortonworks blog)</li>
<li><a href="http://hadoopguide.blogspot.com.es/2013/05/hadoop-hdfs-data-flow-io-classes.html" title="Hadoop HDFS Data Flow IO Classes" target="_blank">Hadoop HDFS Data Flow IO Classes</a>. By Shrey Mehrotra (Hadoop Ecosystem : Hadoop 2.x)</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://malsolo.com/blog4java/?feed=rss2&#038;p=516</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Introduction to Spring Batch. Part II: more on running a Job</title>
		<link>http://malsolo.com/blog4java/?p=375</link>
		<comments>http://malsolo.com/blog4java/?p=375#comments</comments>
		<pubDate>Thu, 04 Sep 2014 15:45:23 +0000</pubDate>
		<dc:creator><![CDATA[Javier (@jbbarquero)]]></dc:creator>
				<category><![CDATA[Architecture]]></category>
		<category><![CDATA[Java]]></category>
		<category><![CDATA[Springsource]]></category>
		<category><![CDATA[JSR-352]]></category>
		<category><![CDATA[Spring Batch]]></category>
		<category><![CDATA[Spring Framework]]></category>

		<guid isPermaLink="false">http://malsolo.com/blog4java/?p=375</guid>
		<description><![CDATA[In the previous blog post entry, we introduced Spring Batch with a simple exposition of its features, main concepts both for configuring and running Batch Jobs. We also saw a sample application and two ways of running it: by invoking &#8230; <a href="http://malsolo.com/blog4java/?p=375">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p>In the previous blog post entry, <a href="http://malsolo.com/blog4java/?p=260" title="Introduction to Spring Batch" target="_blank">we introduced Spring Batch</a> with a simple exposition of its features, main concepts both for configuring and running Batch Jobs.</p>
<p>We also saw a sample application and two ways of running it: by invoking a <a href="http://docs.spring.io/spring-batch/apidocs/index.html?org/springframework/batch/core/launch/JobLauncher.html" title="Interface JobLauncher" target="_blank">JobLauncher</a> bean or by using <a href="http://docs.spring.io/spring-batch/apidocs/index.html?org/springframework/batch/core/launch/support/CommandLineJobRunner.html" title="Class CommandLineJobRunner" target="_blank">CommandLineJobRunner</a> from the command line.</p>
<p>In this blog entry, we&#8217;ll see two additional ways to run a Spring Batch job:</p>
<ol>
<li>Using <a href="http://docs.spring.io/spring-batch/apidocs/index.html?org/springframework/batch/core/launch/JobOperator.html" title="Interface JobOperator" target="_blank">JobOperator</a>, in order to have control of the batch process, from start a job to monitoring tasks such as stopping, restarting, or summarizing a Job. We&#8217;ll only pay attention to the start operation, but once a <a href="http://docs.spring.io/spring-batch/apidocs/index.html?org/springframework/batch/core/launch/JobOperator.html" title="Interface JobOperator" target="_blank">JobOperator</a> is configured, you can use for the remaining monitoring tasks.</li>
<li>Using <a href="http://projects.spring.io/spring-boot/" title="Spring Boot" target="_blank">Spring Boot</a>, the new convention-over-configuration centric framework from the Spring team, that allows you to create with a few lines of code applications that &#8220;just run&#8221;, because Spring Boot provides a lot of features based on what you have in your classpath.</li>
</ol>
<p>As usual, all the source code is available at <a href="https://github.com/jbbarquero/spring-batch-sample" title="My Spring Batch sample at GitHub" target="_blank">GitHub</a>.</p>
<h3>Running the sample: JobOperator</h3>
<p><a href="http://docs.spring.io/spring-batch/apidocs/index.html?org/springframework/batch/core/launch/JobOperator.html" title="Interface JobOperator" target="_blank">JobOperator</a> is an interface that provides operations <a href="http://docs.spring.io/spring-batch/trunk/reference/htmlsingle/#JobOperator" title="JobOperator" target="_blank">for inspecting and controlling jobs</a>, mainly for a command-line client or a remote launcher like a JMX console.</p>
<p>The implementation that Spring Batch provides, <a href="http://docs.spring.io/spring-batch/apidocs/index.html?org/springframework/batch/core/launch/support/SimpleJobOperator.html" title="Class SimpleJobOperator" target="_blank">SimpleJobOperator</a>, uses <a href="http://docs.spring.io/spring-batch/apidocs/index.html?org/springframework/batch/core/launch/JobLauncher.html" title="Interface JobLauncher" target="_blank">JobLauncher</a>, <a href="http://docs.spring.io/spring-batch/apidocs/index.html?org/springframework/batch/core/repository/JobRepository.html" title="Interface JobRepository" target="_blank">JobRepository</a>, <a href="http://docs.spring.io/spring-batch/apidocs/index.html?org/springframework/batch/core/explore/JobExplorer.html" title="Interface JobExplorer" target="_blank">JobExplorer</a>, and <a href="http://docs.spring.io/spring-batch/apidocs/index.html?org/springframework/batch/core/configuration/JobRegistry.html" title="Interface JobRegistry" target="_blank">JobRegistry</a> for performing its operations. They are created by the <a href="http://docs.spring.io/spring-batch/apidocs/index.html?org/springframework/batch/core/configuration/annotation/EnableBatchProcessing.html" title="Annotation Type EnableBatchProcessing" target="_blank">@EnableBatchProcessing</a> annotation, so we can create an <a href="https://github.com/jbbarquero/spring-batch-sample/blob/master/src/main/java/com/malsolo/springframework/batch/sample/AdditionalBatchConfiguration.java" title="AdditionalBatchConfiguration.java" target="_blank">additional batch configuration</a> file with these dependencies autowired and later import it in the <a href="https://github.com/jbbarquero/spring-batch-sample/blob/master/src/main/java/com/malsolo/springframework/batch/sample/BatchConfiguration.java" title="BatchConfiguration.java" target="_blank">batch configuration</a> file (not without issues, due to the way Spring loads the application context, we&#8217;ll see this shortly):</p>
<p></p><pre class="crayon-plain-tag">@Configuration
public class AdditionalBatchConfiguration {

    @Autowired
    JobRepository jobRepository;
    @Autowired
    JobRegistry jobRegistry;
    @Autowired
    JobLauncher jobLauncher;
    @Autowired
    JobExplorer jobExplorer;

    @Bean
    public JobOperator jobOperator() {
        SimpleJobOperator jobOperator = new SimpleJobOperator();
        jobOperator.setJobExplorer(jobExplorer);
        jobOperator.setJobLauncher(jobLauncher);
        jobOperator.setJobRegistry(jobRegistry);
        jobOperator.setJobRepository(jobRepository);
        return jobOperator;
    }

}</pre><p></p>
<p>And the <a href="http://docs.spring.io/spring/docs/current/javadoc-api/index.html?org/springframework/context/annotation/Import.html" title="Annotation Type Import" target="_blank">@Import</a>:</p>
<p></p><pre class="crayon-plain-tag">@Configuration
@EnableBatchProcessing
@Import(AdditionalBatchConfiguration.class)
public class BatchConfiguration {

	// Omitted

}</pre><p></p>
<p>Now it seems easy to run the job with a <a href="https://github.com/jbbarquero/spring-batch-sample/blob/master/src/main/java/com/malsolo/springframework/batch/sample/MainJobOperator.java" title="MainJobOperator.java" target="_blank">main class</a>:</p>
<p></p><pre class="crayon-plain-tag">@Component
public class MainJobOperator {

    @Autowired
    JobOperator jobOperator;

    @Autowired
    Job importUserJob;

    public static void main(String... args) throws JobParametersInvalidException, JobInstanceAlreadyExistsException, NoSuchJobException, DuplicateJobException, NoSuchJobExecutionException {

        AnnotationConfigApplicationContext context = new AnnotationConfigApplicationContext(ApplicationConfiguration.class);

        MainJobOperator main = context.getBean(MainJobOperator.class);
        long executionId = main.jobOperator.start(main.importUserJob.getName(), null);

        MainHelper.reportResults(main.jobOperator, executionId);
        MainHelper.reportPeople(context.getBean(JdbcTemplate.class));

        context.close();

        System.out.printf(&quot;\nFIN %s&quot;, main.getClass().getName());

    }
}</pre><p></p>
<p>But there&#8217;s a little problem&#8230; it doesn&#8217;t work:</p>
<p></p><pre class="crayon-plain-tag">Exception in thread "main" org.springframework.batch.core.launch.NoSuchJobException: No job configuration with the name [importUserJob] was registered
	at org.springframework.batch.core.configuration.support.MapJobRegistry.getJob(MapJobRegistry.java:66)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:483)
	at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:317)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:190)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:157)
	at org.springframework.batch.core.configuration.annotation.SimpleBatchConfiguration$PassthruAdvice.invoke(SimpleBatchConfiguration.java:127)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179)
	at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:207)
	at com.sun.proxy.$Proxy14.getJob(Unknown Source)
	at org.springframework.batch.core.launch.support.SimpleJobOperator.start(SimpleJobOperator.java:310)
	at com.malsolo.springframework.batch.sample.MainJobOperator.main(MainJobOperator.java:15)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:483)
	at com.intellij.rt.execution.application.AppMain.main(AppMain.java:134)

Process finished with exit code 1</pre><p></p>
<p>The problem here, <em><font color=red>No job configuration with the name [<strong>importUserJob</strong>] was registered</font></em>, is due to the way that <a href="http://docs.spring.io/spring-batch/apidocs/org/springframework/batch/core/launch/JobOperator.html#start-java.lang.String-java.lang.String-" title="API" target="_blank">JobOperator.start(String jobName, String parameters)</a> works.</p>
<p>The main difference with <a href="http://docs.spring.io/spring-batch/apidocs/org/springframework/batch/core/launch/JobLauncher.html#run-org.springframework.batch.core.Job-org.springframework.batch.core.JobParameters-" title="API" target="_blank">JobLauncher.run(Job job, JobParameters jobParameters)</a> is that the former has String as parameters while the latter uses objects directly.</p>
<p>So <a href="http://docs.spring.io/spring-batch/apidocs/index.html?org/springframework/batch/core/launch/JobOperator.html" title="Interface JobOperator" target="_blank">JobOperator</a>, actually <a href="http://docs.spring.io/spring-batch/apidocs/index.html?org/springframework/batch/core/launch/support/SimpleJobOperator.html" title="Class SimpleJobOperator" target="_blank">SimpleJobOperator</a>, has to obtain a <a href="http://docs.spring.io/spring-batch/apidocs/index.html?org/springframework/batch/core/Job.html" title="Interface Job" target="_blank">Job</a> with the provided name. In order to do so, it uses the <a href="http://docs.spring.io/spring-batch/apidocs/org/springframework/batch/core/configuration/JobLocator.html#getJob-java.lang.String-" title="API" target="_blank">JobRegistry.getJob(String name)</a> method. The available Spring Batch implementation is <a href="http://docs.spring.io/spring-batch/apidocs/index.html?org/springframework/batch/core/configuration/support/MapJobRegistry.html" title="Class MapJobRegistry" target="_blank">MapJobRegistry</a> that uses a ConcurrentMap to store using the job name as the key, a <a href="http://docs.spring.io/spring-batch/apidocs/index.html?org/springframework/batch/core/configuration/JobFactory.html" title="Interface JobFactory" target="_blank">JobFactory</a> to create the <a href="http://docs.spring.io/spring-batch/apidocs/index.html?org/springframework/batch/core/Job.html" title="Interface Job" target="_blank">Job</a> when requested.</p>
<p>The problem is that this map has not been populated.</p>
<p>The first solution is easy: the JobRegistry allows you to register at runtime a JobFactory to later obtain the Job as explained above. So, we only need to create this JobFactory&#8230;</p>
<p></p><pre class="crayon-plain-tag">@Configuration
public class AdditionalBatchConfiguration {
//    Rest omitted
    @Autowired
    Job importUserJob;

    @Bean
    public JobFactory jobFactory() {
        return new ReferenceJobFactory(importUserJob);
    }
}</pre><p></p>
<p>&#8230;and register it in the main method:</p>
<p></p><pre class="crayon-plain-tag">@Component
public class MainJobOperator {

    @Autowired
    JobFactory jobFactory;
    @Autowired
    JobRegistry jobRegistry;
    @Autowired
    JobOperator jobOperator;

    @Autowired
    Job importUserJob;

    public static void main(String... args) throws JobParametersInvalidException, JobInstanceAlreadyExistsException, NoSuchJobException, DuplicateJobException, NoSuchJobExecutionException {

        AnnotationConfigApplicationContext context = new AnnotationConfigApplicationContext(ApplicationConfiguration.class);

        MainJobOperator main = context.getBean(MainJobOperator.class);
        main.jobRegistry.register(main.jobFactory);
        long executionId = main.jobOperator.start(main.importUserJob.getName(), null);

        MainHelper.reportResults(main.jobOperator, executionId);
        MainHelper.reportPeople(context.getBean(JdbcTemplate.class));

        context.close();

        System.out.printf(&quot;\nFIN %s&quot;, main.getClass().getName());

    }
}</pre><p></p>
<p>And now it works:</p>
<p></p><pre class="crayon-plain-tag">***********************************************************
JobExecution: id=0, version=2, startTime=2014-09-04 13:03:37.964, endTime=2014-09-04 13:03:38.141, lastUpdated=2014-09-04 13:03:38.141, status=COMPLETED, exitStatus=exitCode=COMPLETED;exitDescription=, job=[JobInstance: id=0, version=0, Job=[importUserJob]], jobParameters=[{}]
* Steps executed:
StepExecution: id=0, version=3, name=step1, status=COMPLETED, exitStatus=COMPLETED, readCount=5, filterCount=0, writeCount=5 readSkipCount=0, writeSkipCount=0, processSkipCount=0, commitCount=1, rollbackCount=0, exitDescription=
***********************************************************
***********************************************************
* People found:

* Found firstName: JILL, lastName: DOE in the database

* Found firstName: JOE, lastName: DOE in the database

* Found firstName: JUSTIN, lastName: DOE in the database

* Found firstName: JANE, lastName: DOE in the database

* Found firstName: JOHN, lastName: DOE in the database
***********************************************************</pre><p></p>
<p>But I don&#8217;t like this approach, it&#8217;s too manual.</p>
<p>I&#8217;d rather to populate the JobRegistry automatically, and Spring Batch provides two mechanisms for doing so, <a href="http://docs.spring.io/spring-batch/trunk/reference/htmlsingle/#d4e1228" title="JobRegistryBeanPostProcessor" target="_blank">a bean post-processor</a>, <a href="http://docs.spring.io/spring-batch/apidocs/index.html?org/springframework/batch/core/configuration/support/JobRegistryBeanPostProcessor.html" title="Class JobRegistryBeanPostProcessor" target="_blank">JobRegistryBeanPostProcessor</a>, and <a href="http://docs.spring.io/spring-batch/trunk/reference/htmlsingle/#d4e1233" title="AutomaticJobRegistrar" target="_blank"> a component</a> that loads and unloads Jobs by creating child context and registering  jobs from those contexts as they are created, <a href="http://docs.spring.io/spring-batch/apidocs/index.html?org/springframework/batch/core/configuration/support/AutomaticJobRegistrar.html" title="Class AutomaticJobRegistrar" target="_blank">AutomaticJobRegistrar</a>.</p>
<p>We&#8217;ll see the post-processor approach, because it&#8217;s very easy. Just declare the bean in the Batch configuration and run the original main class.</p>
<p></p><pre class="crayon-plain-tag">@Configuration
@EnableBatchProcessing
@Import(AdditionalBatchConfiguration.class)
public class BatchConfiguration {

	// Omitted

    @Bean
    public JobRegistryBeanPostProcessor jobRegistryBeanPostProcessor(JobRegistry jobRegistry) {
        JobRegistryBeanPostProcessor jobRegistryBeanPostProcessor = new JobRegistryBeanPostProcessor();
        jobRegistryBeanPostProcessor.setJobRegistry(jobRegistry);
        return jobRegistryBeanPostProcessor;
    }

}</pre><p></p>
<p>The bean post processor has to be declared in this configuration file for registering the job when it&#8217;s created (this is the issue that I mentioned before, if you declare the post processor in another java file configuration, for instance in the <a href="https://github.com/jbbarquero/spring-batch-sample/blob/master/src/main/java/com/malsolo/springframework/batch/sample/AdditionalBatchConfiguration.java" title="AdditionalBatchConfiguration.java" target="_blank">AdditionalBatchConfiguration</a> it will never receive the job bean). It uses the same <a href="http://docs.spring.io/spring-batch/apidocs/index.html?org/springframework/batch/core/configuration/JobRegistry.html" title="Interface JobRegistry" target="_blank">JobRegistry</a> that uses the <a href="http://docs.spring.io/spring-batch/apidocs/index.html?org/springframework/batch/core/launch/JobOperator.html" title="Interface JobOperator" target="_blank">JobOperator</a> to launch the <a href="http://docs.spring.io/spring-batch/apidocs/index.html?org/springframework/batch/core/Job.html" title="Interface Job" target="_blank">Job</a>. Actually the only that exists, but it&#8217;s good to know this.</p>
<p>It also works:</p>
<p></p><pre class="crayon-plain-tag">***********************************************************
JobExecution: id=0, version=2, startTime=2014-09-04 13:20:07.343, endTime=2014-09-04 13:20:07.522, lastUpdated=2014-09-04 13:20:07.522, status=COMPLETED, exitStatus=exitCode=COMPLETED;exitDescription=, job=[JobInstance: id=0, version=0, Job=[importUserJob]], jobParameters=[{}]
* Steps executed:
StepExecution: id=0, version=3, name=step1, status=COMPLETED, exitStatus=COMPLETED, readCount=5, filterCount=0, writeCount=5 readSkipCount=0, writeSkipCount=0, processSkipCount=0, commitCount=1, rollbackCount=0, exitDescription=
***********************************************************
***********************************************************
* People found:

* Found firstName: JILL, lastName: DOE in the database

* Found firstName: JOE, lastName: DOE in the database

* Found firstName: JUSTIN, lastName: DOE in the database

* Found firstName: JANE, lastName: DOE in the database

* Found firstName: JOHN, lastName: DOE in the database
***********************************************************</pre><p></p>
<h3>Running the sample: Spring Boot</h3>
<p>We&#8217;d like to try how quickly and easy is Spring Boot for launching Spring Batch applications once we already have a functional configuration.</p>
<p>And it seems to be a piece of cake (the problem here is to know what is happening under the hood)</p>
<p></p><pre class="crayon-plain-tag">package com.malsolo.springframework.batch.sample;
@ComponentScan(excludeFilters = {@ComponentScan.Filter(type = FilterType.ASSIGNABLE_TYPE, value = ApplicationConfiguration.class)})
@EnableAutoConfiguration
public class MainBoot {

    public static void main(String... args) {

        ApplicationContext context = SpringApplication.run(MainBoot.class);

        MainHelper.reportPeople(context.getBean(JdbcTemplate.class));

    }
}</pre><p></p>
<p>Actually, Spring Boot is out of the bounds of this topic, it deserves its own entry (even, an entire book) but we can summarize the important code here:</p>
<ul>
<li>Line 3: <a href="http://docs.spring.io/spring-boot/docs/current/api/index.html?org/springframework/boot/autoconfigure/EnableAutoConfiguration.html" title="Annotation Type EnableAutoConfiguration" target="_blank">@EnableAutoConfiguration</a>, with this annotation you want Spring Boot to instantiate the beans that you&#8217;re going to need based on the libraries on your classpath.</li>
<li>Line 8: <a href="http://docs.spring.io/spring-boot/docs/current/api/org/springframework/boot/SpringApplication.html#run(java.lang.Object[], java.lang.String[])" title="API" target="_blank">the run method</a>, to bootstrap the application by passing the class itself (in our case, <a href="https://github.com/jbbarquero/spring-batch-sample/blob/master/src/main/java/com/malsolo/springframework/batch/sample/MainBoot.java" title="MainBoot.java" target="_blank">MainBoot</a>) that serves as the primary Spring component.
</ul>
<p>It&#8217;s enough to run the Batch application:</p>
<p></p><pre class="crayon-plain-tag">.   ____          _            __ _ _
 /\\ / ___'_ __ _ _(_)_ __  __ _ \ \ \ \
( ( )\___ | '_ | '_| | '_ \/ _` | \ \ \ \
 \\/  ___)| |_)| | | | | || (_| |  ) ) ) )
  '  |____| .__|_| |_|_| |_\__, | / / / /
 =========|_|==============|___/=/_/_/_/
 :: Spring Boot ::        (v1.1.4.RELEASE)

2014-09-04 16:58:55.109  INFO 15646 --- [           main] c.m.s.batch.sample.MainBoot              : Starting MainBoot on jbeneito-Latitude-3540 with PID 15646 (/home/jbeneito/Documents/git/spring-batch-sample/target/classes started by jbeneito in /home/jbeneito/Documents/git/spring-batch-sample)
2014-09-04 16:58:55.237  INFO 15646 --- [           main] s.c.a.AnnotationConfigApplicationContext : Refreshing org.springframework.context.annotation.AnnotationConfigApplicationContext@6366ebe0: startup date [Thu Sep 04 16:58:55 CEST 2014]; root of context hierarchy
2014-09-04 16:58:56.039  INFO 15646 --- [           main] o.s.b.f.s.DefaultListableBeanFactory     : Overriding bean definition for bean 'jdbcTemplate': replacing [Root bean: class [null]; scope=; abstract=false; lazyInit=false; autowireMode=3; dependencyCheck=0; autowireCandidate=true; primary=false; factoryBeanName=batchConfiguration; factoryMethodName=jdbcTemplate; initMethodName=null; destroyMethodName=(inferred); defined in class path resource [com/malsolo/springframework/batch/sample/BatchConfiguration.class]] with [Root bean: class [null]; scope=; abstract=false; lazyInit=false; autowireMode=3; dependencyCheck=0; autowireCandidate=true; primary=false; factoryBeanName=mainForSpringInfo; factoryMethodName=jdbcTemplate; initMethodName=null; destroyMethodName=(inferred); defined in class path resource [com/malsolo/springframework/batch/sample/MainForSpringInfo.class]]
2014-09-04 16:58:56.691  WARN 15646 --- [           main] o.s.c.a.ConfigurationClassEnhancer       : @Bean method ScopeConfiguration.stepScope is non-static and returns an object assignable to Spring's BeanFactoryPostProcessor interface. This will result in a failure to process annotations such as @Autowired, @Resource and @PostConstruct within the method's declaring @Configuration class. Add the 'static' modifier to this method to avoid these container lifecycle issues; see @Bean Javadoc for complete details
2014-09-04 16:58:56.724  WARN 15646 --- [           main] o.s.c.a.ConfigurationClassEnhancer       : @Bean method ScopeConfiguration.jobScope is non-static and returns an object assignable to Spring's BeanFactoryPostProcessor interface. This will result in a failure to process annotations such as @Autowired, @Resource and @PostConstruct within the method's declaring @Configuration class. Add the 'static' modifier to this method to avoid these container lifecycle issues; see @Bean Javadoc for complete details
2014-09-04 16:58:56.729  INFO 15646 --- [           main] f.a.AutowiredAnnotationBeanPostProcessor : JSR-330 'javax.inject.Inject' annotation found and supported for autowiring
2014-09-04 16:58:56.975  INFO 15646 --- [           main] trationDelegate$BeanPostProcessorChecker : Bean 'batchConfiguration' of type [class com.malsolo.springframework.batch.sample.BatchConfiguration$$EnhancerBySpringCGLIB$$c3ec56ab] is not eligible for getting processed by all BeanPostProcessors (for example: not eligible for auto-proxying)
2014-09-04 16:58:57.145  INFO 15646 --- [           main] trationDelegate$BeanPostProcessorChecker : Bean 'org.springframework.transaction.annotation.ProxyTransactionManagementConfiguration' of type [class org.springframework.transaction.annotation.ProxyTransactionManagementConfiguration$$EnhancerBySpringCGLIB$$5a15b25b] is not eligible for getting processed by all BeanPostProcessors (for example: not eligible for auto-proxying)
2014-09-04 16:58:57.238  INFO 15646 --- [           main] trationDelegate$BeanPostProcessorChecker : Bean 'transactionAttributeSource' of type [class org.springframework.transaction.annotation.AnnotationTransactionAttributeSource] is not eligible for getting processed by all BeanPostProcessors (for example: not eligible for auto-proxying)
2014-09-04 16:58:57.294  INFO 15646 --- [           main] trationDelegate$BeanPostProcessorChecker : Bean 'transactionInterceptor' of type [class org.springframework.transaction.interceptor.TransactionInterceptor] is not eligible for getting processed by all BeanPostProcessors (for example: not eligible for auto-proxying)
2014-09-04 16:58:57.301  INFO 15646 --- [           main] trationDelegate$BeanPostProcessorChecker : Bean 'org.springframework.transaction.config.internalTransactionAdvisor' of type [class org.springframework.transaction.interceptor.BeanFactoryTransactionAttributeSourceAdvisor] is not eligible for getting processed by all BeanPostProcessors (for example: not eligible for auto-proxying)
2014-09-04 16:58:57.374  INFO 15646 --- [           main] trationDelegate$BeanPostProcessorChecker : Bean 'dataSourceConfiguration' of type [class com.malsolo.springframework.batch.sample.DataSourceConfiguration$$EnhancerBySpringCGLIB$$18a97d02] is not eligible for getting processed by all BeanPostProcessors (for example: not eligible for auto-proxying)
2014-09-04 16:58:57.455  INFO 15646 --- [           main] o.s.j.d.e.EmbeddedDatabaseFactory        : Creating embedded database 'testdb'
2014-09-04 16:58:58.156  INFO 15646 --- [           main] o.s.jdbc.datasource.init.ScriptUtils     : Executing SQL script from class path resource [schema-all.sql]
2014-09-04 16:58:58.178  INFO 15646 --- [           main] o.s.jdbc.datasource.init.ScriptUtils     : Executed SQL script from class path resource [schema-all.sql] in 20 ms.
2014-09-04 16:58:58.178  INFO 15646 --- [           main] o.s.jdbc.datasource.init.ScriptUtils     : Executing SQL script from class path resource [org/springframework/batch/core/schema-hsqldb.sql]
2014-09-04 16:58:58.189  INFO 15646 --- [           main] o.s.jdbc.datasource.init.ScriptUtils     : Executed SQL script from class path resource [org/springframework/batch/core/schema-hsqldb.sql] in 11 ms.
2014-09-04 16:58:58.198  INFO 15646 --- [           main] trationDelegate$BeanPostProcessorChecker : Bean 'dataSource' of type [class org.springframework.jdbc.datasource.embedded.EmbeddedDatabaseFactory$EmbeddedDataSourceProxy] is not eligible for getting processed by all BeanPostProcessors (for example: not eligible for auto-proxying)
2014-09-04 16:58:58.206  INFO 15646 --- [           main] trationDelegate$BeanPostProcessorChecker : Bean 'org.springframework.boot.autoconfigure.jdbc.DataSourceAutoConfiguration$DataSourceInitializerConfiguration' of type [class org.springframework.boot.autoconfigure.jdbc.DataSourceAutoConfiguration$DataSourceInitializerConfiguration$$EnhancerBySpringCGLIB$$c0608242] is not eligible for getting processed by all BeanPostProcessors (for example: not eligible for auto-proxying)
2014-09-04 16:58:58.257  INFO 15646 --- [           main] trationDelegate$BeanPostProcessorChecker : Bean 'spring.datasource.CONFIGURATION_PROPERTIES' of type [class org.springframework.boot.autoconfigure.jdbc.DataSourceProperties] is not eligible for getting processed by all BeanPostProcessors (for example: not eligible for auto-proxying)
2014-09-04 16:58:58.260  INFO 15646 --- [           main] o.s.jdbc.datasource.init.ScriptUtils     : Executing SQL script from URL [file:/home/jbeneito/Documents/git/spring-batch-sample/target/classes/schema-all.sql]
2014-09-04 16:58:58.262  INFO 15646 --- [           main] o.s.jdbc.datasource.init.ScriptUtils     : Executed SQL script from URL [file:/home/jbeneito/Documents/git/spring-batch-sample/target/classes/schema-all.sql] in 2 ms.
2014-09-04 16:58:58.263  WARN 15646 --- [           main] o.s.b.a.jdbc.DataSourceInitializer       : Could not send event to complete DataSource initialization (ApplicationEventMulticaster not initialized - call 'refresh' before multicasting events via the context: org.springframework.context.annotation.AnnotationConfigApplicationContext@6366ebe0: startup date [Thu Sep 04 16:58:55 CEST 2014]; root of context hierarchy)
2014-09-04 16:58:58.263  INFO 15646 --- [           main] trationDelegate$BeanPostProcessorChecker : Bean 'dataSourceInitializer' of type [class org.springframework.boot.autoconfigure.jdbc.DataSourceInitializer] is not eligible for getting processed by all BeanPostProcessors (for example: not eligible for auto-proxying)
2014-09-04 16:58:58.277  INFO 15646 --- [           main] trationDelegate$BeanPostProcessorChecker : Bean 'org.springframework.batch.core.configuration.annotation.SimpleBatchConfiguration' of type [class org.springframework.batch.core.configuration.annotation.SimpleBatchConfiguration$$EnhancerBySpringCGLIB$$85a27e41] is not eligible for getting processed by all BeanPostProcessors (for example: not eligible for auto-proxying)
2014-09-04 16:58:58.320  INFO 15646 --- [           main] trationDelegate$BeanPostProcessorChecker : Bean 'jobRegistry' of type [class com.sun.proxy.$Proxy25] is not eligible for getting processed by all BeanPostProcessors (for example: not eligible for auto-proxying)
2014-09-04 16:58:58.795  INFO 15646 --- [           main] o.s.b.c.r.s.JobRepositoryFactoryBean     : No database type set, using meta data indicating: HSQL
2014-09-04 16:58:59.056  INFO 15646 --- [           main] o.s.b.c.l.support.SimpleJobLauncher      : No TaskExecutor has been set, defaulting to synchronous executor.
2014-09-04 16:58:59.361  INFO 15646 --- [           main] o.s.jdbc.datasource.init.ScriptUtils     : Executing SQL script from class path resource [org/springframework/batch/core/schema-hsqldb.sql]
2014-09-04 16:58:59.381  INFO 15646 --- [           main] o.s.jdbc.datasource.init.ScriptUtils     : Executed SQL script from class path resource [org/springframework/batch/core/schema-hsqldb.sql] in 20 ms.
2014-09-04 16:58:59.890  INFO 15646 --- [           main] o.s.j.e.a.AnnotationMBeanExporter        : Registering beans for JMX exposure on startup
2014-09-04 16:58:59.926  INFO 15646 --- [           main] o.s.b.a.b.JobLauncherCommandLineRunner   : Running default command line with: []
2014-09-04 16:59:00.023  INFO 15646 --- [           main] o.s.b.c.l.support.SimpleJobLauncher      : Job: [FlowJob: [name=importUserJob]] launched with the following parameters: [{run.id=1}]
2014-09-04 16:59:00.060  INFO 15646 --- [           main] o.s.batch.core.job.SimpleStepHandler     : Executing step: [step1]
Converting (firstName: Jill, lastName: Doe) into (firstName: JILL, lastName: DOE)
Converting (firstName: Joe, lastName: Doe) into (firstName: JOE, lastName: DOE)
Converting (firstName: Justin, lastName: Doe) into (firstName: JUSTIN, lastName: DOE)
Converting (firstName: Jane, lastName: Doe) into (firstName: JANE, lastName: DOE)
Converting (firstName: John, lastName: Doe) into (firstName: JOHN, lastName: DOE)
2014-09-04 16:59:00.162  INFO 15646 --- [           main] o.s.b.c.l.support.SimpleJobLauncher      : Job: [FlowJob: [name=importUserJob]] completed with the following parameters: [{run.id=1}] and the following status: [COMPLETED]
2014-09-04 16:59:00.164  INFO 15646 --- [           main] c.m.s.batch.sample.MainBoot              : Started MainBoot in 5.963 seconds (JVM running for 8.019)
***********************************************************
* People found:

* Found firstName: JILL, lastName: DOE in the database

* Found firstName: JOE, lastName: DOE in the database

* Found firstName: JUSTIN, lastName: DOE in the database

* Found firstName: JANE, lastName: DOE in the database

* Found firstName: JOHN, lastName: DOE in the database
***********************************************************
2014-09-04 16:59:00.258  INFO 15646 --- [       Thread-1] s.c.a.AnnotationConfigApplicationContext : Closing org.springframework.context.annotation.AnnotationConfigApplicationContext@6366ebe0: startup date [Thu Sep 04 16:58:55 CEST 2014]; root of context hierarchy
2014-09-04 16:59:00.262  INFO 15646 --- [       Thread-1] o.s.j.e.a.AnnotationMBeanExporter        : Unregistering JMX-exposed beans on shutdown

Process finished with exit code 0</pre><p></p>
<p>As a side note, this class already scans for compontents, so we don’t need to make an additional scan for components, so we exclude the <a href="https://github.com/jbbarquero/spring-batch-sample/blob/master/src/main/java/com/malsolo/springframework/batch/sample/ApplicationConfiguration.java" title="ApplicationConfiguration.java" target="_blank">ApplicationConfiguration</a> class with a filter.</p>
<p>Finally, we use the <a href="https://github.com/jbbarquero/spring-batch-sample/blob/master/src/main/java/com/malsolo/springframework/batch/sample/MainHelper.java" title="MainHelper.java" target="_blank">MainHelper</a> class to show a summary of the results.</p>
<p>That&#8217;s all for now, because one more time this entry is growing really fast. Thus, we&#8217;ll see in the next post the last topic of Spring Batch that I want to talk about: <a href="https://jcp.org/en/jsr/detail?id=352" title="JSR 352: Batch Applications for the Java Platform" target="_blank">JSR 352</a>.</p>
<h3>Resources</h3>
<ul>
<li><a href="https://www.youtube.com/watch?v=lHCPppMlylY" title="Youtube" target="_blank">Webinar: Spring Batch 3.0.0</a> by <strong>Michael Minella</strong>. Published on Jun 18, 2014.</li>
<li><a href="https://www.youtube.com/watch?v=yKs4yPs-5yU" title="youtube" target="_blank">JSR-352, Spring Batch and You</a> by <strong>Michael Minella</strong>. Published on Feb 3, 2014.</li>
<li><a href="https://www.youtube.com/watch?v=8tiqeV07XlI" title="youtube" target="_blank">Integrating Spring Batch and Spring Integration</a> by <strong>Gunnar Hillert</strong>, <strong>Michael Minella</strong>. Published on Jul 9, 2014.</li>
<li><a href="http://www.amazon.com/Pro-Spring-Batch-Experts-Voice/dp/1430234520/ref=sr_1_1?ie=UTF8&#038;qid=1409752293&#038;sr=8-1&#038;keywords=pro+spring+batch" title="Amazon" target="_blank">Pro Spring Batch (Expert&#8217;s Voice in Spring)</a> by <strong>Michael Minella</strong>. Published on July 12, 2011 by <a href="http://www.apress.com/9781430234524" title="Apress" target="_blank">Apress</a>.</li>
<li>Spring Batch in Action by <strong>Arnaud Cogoluegnes</strong>, <strong>Thierry Templier</strong>, <strong>Gary Gregory</strong>, <strong>Olivier Bazoud</strong>. Published on October 10, 2011 by <a href="http://www.manning.com/templier/" title="Manning" target="_blank">Manning Publications</a>.</li>
<li>Spring.io GETTING STARTED GUIDE: <a href="http://spring.io/guides/gs/batch-processing/" title="GETTING STARTED: Creating a Batch Service" target="_blank">Creating a Batch Service</a>.
<li><a href="http://docs.spring.io/spring-batch/trunk/reference/html/" title="Spring Batch - Reference Documentation" target="_blank">Spring Batch &#8211; Reference Documentation</a></li>
<li><a href="http://docs.spring.io/spring-batch/apidocs/index.html?overview-summary.html" title="Spring Batch 3.0.1.RELEASE API" target="_blank">Spring Batch &#8211; API specification</a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://malsolo.com/blog4java/?feed=rss2&#038;p=375</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Introduction to Spring Batch</title>
		<link>http://malsolo.com/blog4java/?p=260</link>
		<comments>http://malsolo.com/blog4java/?p=260#comments</comments>
		<pubDate>Wed, 03 Sep 2014 10:58:02 +0000</pubDate>
		<dc:creator><![CDATA[Javier (@jbbarquero)]]></dc:creator>
				<category><![CDATA[Architecture]]></category>
		<category><![CDATA[Java]]></category>
		<category><![CDATA[Springsource]]></category>
		<category><![CDATA[JSR-352]]></category>
		<category><![CDATA[Spring Batch]]></category>
		<category><![CDATA[Spring Framework]]></category>

		<guid isPermaLink="false">http://malsolo.com/blog4java/?p=260</guid>
		<description><![CDATA[Spring Batch is the Spring Project aimed to write Java Batch applications by using the foundations of Spring Framework. Michael T. Minella, project lead of Spring Batch and also a member of the JSR 352 (Batch Applications for the Java &#8230; <a href="http://malsolo.com/blog4java/?p=260">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p><a href="http://projects.spring.io/spring-batch/" title="Spring Batch" target="_blank">Spring Batch</a> is the <a href="http://spring.io/projects" title="Spring Projects" target="_blank">Spring Project</a> aimed <a href="http://docs.spring.io/spring-batch/apidocs/index.html?org/springframework/batch/core/configuration/annotation/DefaultBatchConfigurer.html" title="Class DefaultBatchConfigurer" target="_blank"></a>to write Java Batch applications by using the foundations of Spring Framework.</p>
<p><a href="http://spring.io/team/mminella" title="Michael Minella" target="_blank">Michael T. Minella</a>, project lead of Spring Batch and also a member of the <a href="https://jcp.org/en/jsr/detail?id=352" title="JSR 352" target="_blank">JSR 352 (Batch Applications for the Java Platform)</a> expert group, wrote in his book <a href="http://www.amazon.com/Pro-Spring-Batch-Experts-Voice/dp/1430234520" title="Pro Spring Batch" target="_blank">Pro Spring Batch</a> the next definition &#8220;<em>Batch processing [&#8230;] is defined as the processing of data without interaction or interruption. Once started, a batch process runs to some form of completion without any intervention</em>&#8220;.</p>
<p>Typically Batch Jobs are long-running, non-interactive and process large volumes of data, more than fits in memory or a single transaction. Thus they usually run outside office hours and include logic for handling errors   and restarting if necessary.</p>
<p>Spring Batch provides, among others, the next features:</p>
<ul>
<li>Transaction management, to allow you to focus on business processing.</li>
<li>Chunk based processing, to process a large value of data by dividing it in small pieces.</li>
<liDeclarative I/O, by providing readers and writers for a lot of scenarios.</li>
<li>Start/Stop/Restart/Skip/Retry capabilities, to handle non-interactive management of the process.</li>
<li>Web based administration interface (Spring Batch Admin), it provides an API for administering tasks.</li>
<li>Based on Spring framework, so it includes all the configuration options, including Dependency Injection.</li>
<li>Compliance with JSR 352: Batch Applications for the Java Platform.</li>
</ul>
<h3>Spring Batch concepts</h3>
<div id="attachment_266" style="width: 310px" class="wp-caption alignnone"><a href="http://malsolo.com/blog4java/wp-content/uploads/2014/08/spring-batch-reference-model.png"><img src="http://malsolo.com/blog4java/wp-content/uploads/2014/08/spring-batch-reference-model-300x119.png" alt="The Domain Language of Batch" width="300" height="119" class="size-medium wp-image-266" /></a><p class="wp-caption-text">Batch Stereotypes</p></div>
<ul>
<li><a href="http://docs.spring.io/spring-batch/trunk/reference/htmlsingle/#domainJob" title="Job" target="_blank">Job</a>: an entity that encapsulates an entire batch process. It is composed of one or more ordered <strong>Steps</strong> and it has some properties such as restartability.</li>
<li><a href="http://docs.spring.io/spring-batch/trunk/reference/htmlsingle/#domainStep" title="Step" target="_blank">Step</a>: a domain object that encapsulates an independent, sequential phase of a batch job.</li>
<li><a href="http://docs.spring.io/spring-batch/trunk/reference/htmlsingle/#readersAndWriters" title="ItemReaders and ItemWriters and  ItemProcessors" target="_blank">Item</a>: the individual piece of data that it&#8217;s been processed.</li>
<li><a href="http://docs.spring.io/spring-batch/trunk/reference/htmlsingle/#chunkOrientedProcessing" title="Chunk-Oriented Processing" target="_blank">Chunk</a>: the processing style used by Spring Batch: read and process the item and then aggregate until reach a number of items, called &#8220;<em>chunk</em>&#8221; that will be finally written.</li>
<div id="attachment_271" style="width: 310px" class="wp-caption alignnone"><a href="http://malsolo.com/blog4java/wp-content/uploads/2014/08/chunk-oriented-processing.png"><img src="http://malsolo.com/blog4java/wp-content/uploads/2014/08/chunk-oriented-processing-300x165.png" alt="Chunk-Oriented Processing" width="300" height="165" class="size-medium wp-image-271" /></a><p class="wp-caption-text">Chunk-Oriented Processing</p></div>
<li><a href="http://docs.spring.io/spring-batch/trunk/reference/htmlsingle/#domainJobLauncher" title="JobLauncher" target="_blank">JobLauncher</a>: the entry point to launch Spring Batch jobs with a given set of JobParameters.</li>
<li><a href="http://docs.spring.io/spring-batch/trunk/reference/htmlsingle/#domainJobRepository" title="JobRepository" target="_blank">JobRepository</a>: maintains all metadata related to job executions and provides CRUD operations for JobLauncher, Job, and Step implementations.</li>
</ul>
<h3>Running a Job</h3>
<p>The <a href="http://docs.spring.io/spring-batch/apidocs/index.html?org/springframework/batch/core/launch/JobLauncher.html" title="JobLauncher API" target="_blank">JobLauncher</a> interface has a basic implementation <a href="http://docs.spring.io/spring-batch/apidocs/index.html?org/springframework/batch/core/launch/support/SimpleJobLauncher.html" title="SimpleJobLauncher API" target="_blank">SimpleJobLauncher</a> whose only required dependency is a <a href="http://docs.spring.io/spring-batch/apidocs/index.html?org/springframework/batch/core/repository/JobRepository.html" title="JobRepository API" target="_blank">JobRepository</a>, in order to obtain an execution, so that you can use it for executing the Job.</p>
<div id="attachment_275" style="width: 310px" class="wp-caption alignnone"><a href="http://malsolo.com/blog4java/wp-content/uploads/2014/08/job-launcher-sequence-sync.png"><img src="http://malsolo.com/blog4java/wp-content/uploads/2014/08/job-launcher-sequence-sync-300x229.png" alt="JobLauncher" width="300" height="229" class="size-medium wp-image-275" /></a><p class="wp-caption-text">JobLauncher</p></div>
<p>You can also launch a Job asynchronously by configuring a <a href="http://docs.spring.io/spring/docs/current/javadoc-api/index.html?org/springframework/core/task/TaskExecutor.html" title="TaskExecutor API" target="_blank">TaskExecutor</a>. You can also this configuration <a href="http://docs.spring.io/spring-batch/trunk/reference/htmlsingle/#runningJobsFromWebContainer" title="Running Jobs from within a Web Container" target="_blank">for running Jobs from within a Web Container</a>.</p>
<div id="attachment_277" style="width: 310px" class="wp-caption alignnone"><a href="http://malsolo.com/blog4java/wp-content/uploads/2014/08/job-launcher-sequence-async.png"><img src="http://malsolo.com/blog4java/wp-content/uploads/2014/08/job-launcher-sequence-async-300x229.png" alt="Job launcher sequence async" width="300" height="229" class="size-medium wp-image-277" /></a><p class="wp-caption-text">Job launcher sequence async</p></div>
<p>A <a href="http://docs.spring.io/spring-batch/apidocs/index.html?org/springframework/batch/core/launch/JobLauncher.html" title="JobLauncher API" target="_blank">JobLauncher</a> uses the <a href="http://docs.spring.io/spring-batch/apidocs/index.html?org/springframework/batch/core/repository/JobRepository.html" title="JobRepository API" target="_blank">JobRepository</a> to create new <strong>JobExecution</strong> objects and run them.</p>
<div id="attachment_281" style="width: 310px" class="wp-caption alignnone"><a href="http://malsolo.com/blog4java/wp-content/uploads/2014/08/job-repository.png"><img src="http://malsolo.com/blog4java/wp-content/uploads/2014/08/job-repository-300x265.png" alt="Job Repository" width="300" height="265" class="size-medium wp-image-281" /></a><p class="wp-caption-text">Job Repository</p></div>
<h3>Running Jobs: concepts</h3>
<p>The main concepts related with Job execution are</p>
<div id="attachment_283" style="width: 310px" class="wp-caption alignnone"><a href="http://malsolo.com/blog4java/wp-content/uploads/2014/08/jobHeirarchyWithSteps.png"><img src="http://malsolo.com/blog4java/wp-content/uploads/2014/08/jobHeirarchyWithSteps-300x220.png" alt="Job hierarchy with steps" width="300" height="220" class="size-medium wp-image-283" /></a><p class="wp-caption-text">Job hierarchy with steps</p></div>
<ul>
<li><a href="http://docs.spring.io/spring-batch/apidocs/index.html?org/springframework/batch/core/JobInstance.html" title="JobInstance API" target="_blank">JobInstance</a>: a logical run of a Job.</li>
<li><a href="http://docs.spring.io/spring-batch/apidocs/index.html?org/springframework/batch/core/JobParameter.html" title="JobParameter API" target="_blank">JobParameters</a>: a set of parameters used to start a batch job. It categorizes each JobInstance.</li>
<li><a href="http://docs.spring.io/spring-batch/apidocs/index.html?org/springframework/batch/core/JobExecution.html" title="JobExecution API" target="_blank">JobExecution</a>: physical runs of Jobs, in order to know what happens with the execution.</li>
<li><a href="http://docs.spring.io/spring-batch/apidocs/index.html?org/springframework/batch/core/StepExecution.html" title="StepExecution API" target="_blank">StepExecution</a>: a single attempt to execute a Step, that is created each time a Step is run and it also provides information regarding the result of the processing.</li>
<li><a href="http://docs.spring.io/spring-batch/apidocs/index.html?org/springframework/batch/item/ExecutionContext.html" title="ExecutionContext API" target="_blank">ExecutionContext</a>: a collection of key/value pairs that are persisted and controlled by the framework in order to allow developers a place to store persistent state that is scoped to a StepExecution or JobExecution.</li>
</ul>
<h3>Sample application</h3>
<p>Now we are going to see a simple sample application that reads a POJO that represents a Person from a file containing People data and after processing each  of them, that is just uppercase its attributes, saving them in a database.</p>
<p>All the code is available at <a href="https://github.com/jbbarquero/spring-batch-sample" title="My Spring Batch sample at GitHub" target="_blank">GitHub</a>. </p>
<p>Let&#8217;s begin with the basic domain class: <a href="https://github.com/jbbarquero/spring-batch-sample/blob/master/src/main/java/com/malsolo/springframework/batch/sample/Person.java" title="Person.java" target="_blank">Person</a>, just a POJO.</p>
<p></p><pre class="crayon-plain-tag">package com.malsolo.springframework.batch.sample;

public class Person {
    private String lastName;
    private String firstName;
    //...
}</pre><p></p>
<p>Then, let&#8217;s see the simple processor, <a href="https://github.com/jbbarquero/spring-batch-sample/blob/master/src/main/java/com/malsolo/springframework/batch/sample/PersonItemProcessor.java" title="PersonItemProcessor.java" target="_blank">PersonItemProcessor</a>. It implements an <a href="http://docs.spring.io/spring-batch/apidocs/index.html?org/springframework/batch/item/ItemProcessor.html" title="ItemProcessor API" target="_blank">ItemProcessor</a>, with a Person both as Input and Output.</p>
<p>It provides a method to be overwritten, <strong>process</strong>, that allows you to write the custom transformation.</p>
<p></p><pre class="crayon-plain-tag">package com.malsolo.springframework.batch.sample;

import org.springframework.batch.item.ItemProcessor;

public class PersonItemProcessor implements ItemProcessor&lt;Person, Person&gt; {

    @Override
    public Person process(final Person person) throws Exception {
        final String firstName = person.getFirstName().toUpperCase();
        final String lastName = person.getLastName().toUpperCase();

        final Person transformedPerson = new Person(firstName, lastName);

        System.out.println(&quot;Converting (&quot; + person + &quot;) into (&quot; + transformedPerson + &quot;)&quot;);

        return transformedPerson;
    }

}</pre><p></p>
<p>Once done this, we can proceed to configure the Spring Batch Application, for doing so, we&#8217;ll use Java Annotations in a <a href="https://github.com/jbbarquero/spring-batch-sample/blob/master/src/main/java/com/malsolo/springframework/batch/sample/BatchConfiguration.java" title="BatchConfiguration.java" target="_blank">BatchConfiguration</a> file:</p>
<p></p><pre class="crayon-plain-tag">// Imports and package omitted
@Configuration
@EnableBatchProcessing
@Import(AdditionalBatchConfiguration.class)
public class BatchConfiguration {

	// Input, processor, and output definition
	
	@Bean
    public ItemReader&lt;Person&gt; reader() {
		FlatFileItemReader&lt;Person&gt; reader = new FlatFileItemReader&lt;Person&gt;();
		reader.setResource(new ClassPathResource(&quot;sample-data.csv&quot;));
		reader.setLineMapper(new DefaultLineMapper&lt;Person&gt;() {{
			setLineTokenizer(new DelimitedLineTokenizer() {{
				setNames(new String[] {&quot;firstName&quot;, &quot;lastName&quot;});
			}});
			setFieldSetMapper(new BeanWrapperFieldSetMapper&lt;Person&gt;() {{
				setTargetType(Person.class);
			}});
			
		}});
		return reader;
	}
	
	@Bean
    public ItemProcessor&lt;Person, Person&gt; processor() {
        return new PersonItemProcessor();
    }
	
	@Bean
    public ItemWriter&lt;Person&gt; writer(DataSource dataSource) {
		JdbcBatchItemWriter&lt;Person&gt; writer = new JdbcBatchItemWriter&lt;Person&gt;();
		writer.setItemSqlParameterSourceProvider(new BeanPropertyItemSqlParameterSourceProvider&lt;Person&gt;());
		writer.setSql(&quot;INSERT INTO people (first_name, last_name) VALUES (:firstName, :lastName)&quot;);
		writer.setDataSource(dataSource);
		return writer;
	}
	
	//  Actual job configuration
	
	@Bean
    public Job importUserJob(JobBuilderFactory jobs, Step s1) {
		return jobs.get(&quot;importUserJob&quot;)
				.incrementer(new RunIdIncrementer())
				.flow(s1)
				.end()
				.build();
	}
	
	@Bean
    public Step step1(StepBuilderFactory stepBuilderFactory, ItemReader&lt;Person&gt; reader,
            ItemWriter&lt;Person&gt; writer, ItemProcessor&lt;Person, Person&gt; processor) {
		return stepBuilderFactory.get(&quot;step1&quot;)
				.&lt;Person, Person&gt; chunk(10)
				.reader(reader)
				.processor(processor)
				.writer(writer)
				.build();
	}
	
	@Bean
    public JdbcTemplate jdbcTemplate(DataSource dataSource) {
        return new JdbcTemplate(dataSource);
    }
	
}</pre><p></p>
<p>Highlights for this class are:</p>
<ul>
<li>Line 2: <a href="http://docs.spring.io/spring/docs/current/javadoc-api/index.html?org/springframework/context/annotation/Configuration.html" title="Configuration API" target="_blank">@Configuration</a>, this class will be processed by the <a href="http://docs.spring.io/spring/docs/current/spring-framework-reference/htmlsingle/#beans-java-basic-concepts" title="Java-based container configuration, basic concepts." target="_blank">Spring container to generate bean definitions</a>.</li>
<li>Line 3: <a href="http://docs.spring.io/spring-batch/apidocs/index.html?org/springframework/batch/core/configuration/annotation/EnableBatchProcessing.html" title="EnableBatchProcessing API" target="_blank">@EnableBatchProcessing</a>, provides a base configuration for building batch jobs <a href="http://docs.spring.io/spring-batch/trunk/reference/htmlsingle/#javaConfig" title="Spring Batch Java Config" target="_blank">by creating the next beans beans available to be autowired</a>:
<ul>
<li>JobRepository &#8211; bean name &#8220;jobRepository&#8221;</li>
<li>JobLauncher &#8211; bean name &#8220;jobLauncher&#8221;</li>
<li>JobRegistry &#8211; bean name &#8220;jobRegistry&#8221;</li>
<li>PlatformTransactionManager &#8211; bean name &#8220;transactionManager&#8221;</li>
<li>JobBuilderFactory &#8211; bean name &#8220;jobBuilders&#8221;</li>
<li>StepBuilderFactory &#8211; bean name &#8220;stepBuilders&#8221;</li>
</ul>
<p>We&#8217;ll see shortly how it works.</li>
<li>Line 10: the <strong>reader bean</strong>, an instance of a <a href="http://docs.spring.io/spring-batch/apidocs/index.html?org/springframework/batch/item/file/FlatFileItemReader.html" title="Class FlatFileItemReader&lt;T&gt;" target="_blank">FlatFileItemReader</a>, that implements the <a href="http://docs.spring.io/spring-batch/apidocs/index.html?org/springframework/batch/item/ItemReader.html" title="Interface ItemReader&lt;T&gt;" target="_blank">ItemReader</a> interface to read each <a href="https://github.com/jbbarquero/spring-batch-sample/blob/master/src/main/java/com/malsolo/springframework/batch/sample/Person.java" title="Person.java" target="_blank">Person</a> from the file containing people. Spring Batch provides several implementations for this interface, being this implementation that read lines from one Resource one of them.You know, <u>no need of custom code</u>.</li>
<li>Line 26: the <strong>processor bean</strong>, an instance of the previously defined <a href="https://github.com/jbbarquero/spring-batch-sample/blob/master/src/main/java/com/malsolo/springframework/batch/sample/PersonItemProcessor.java" title="PersonItemProcessor.java" target="_blank">PersonItemProcessor</a>. See above.</li>
<li>Line 31: the <strong>writer bean</strong>, an instance of a <a href="http://docs.spring.io/spring-batch/apidocs/index.html?org/springframework/batch/item/database/JdbcBatchItemWriter.html" title="Class JdbcBatchItemWriter&lt;T&gt;" target="_blank">JdbcBatchItemWriter</a>, that implements the <a href="http://docs.spring.io/spring-batch/apidocs/index.html?org/springframework/batch/item/ItemWriter.html" title="Interface ItemWriter&lt;T&gt;" target="_blank">ItemWriter interface</a> to write the People already processed to the database. It&#8217;s also an implementation provided by Spring Batch, so <u>no need of custom code</u> again. In this case, you only have to provide an SQL, and a callback for the parameters. Since we are using named parameters, we&#8217;ve chosen a <a href="http://docs.spring.io/spring-batch/apidocs/index.html?org/springframework/batch/item/database/BeanPropertyItemSqlParameterSourceProvider.html" title="Class BeanPropertyItemSqlParameterSourceProvider&lt;T&gt;" target="_blank">BeanPropertyItemSqlParameterSourceProvider</a>. This bean also needs a DataSource, so we provided it by passing one as a method parameter in order to Spring inject the instance that it has registered.</li>
<li>Line 42: a <strong><a href="http://docs.spring.io/spring-batch/apidocs/index.html?org/springframework/batch/core/Job.html" title="Interface Job" target="_blank">Job</a> bean</strong>, that it&#8217;s built using the <a href="http://docs.spring.io/spring-batch/apidocs/index.html?org/springframework/batch/core/configuration/annotation/JobBuilderFactory.html" title="Class JobBuilderFactory" target="_blank">JobBuilderFactory</a> that is autowired by passing it as method parameter for this @Bean method. When you call its get method, Spring Batch will create a <strong>job builder</strong> and will initialize its job repository, the <a href="http://docs.spring.io/spring-batch/apidocs/index.html?org/springframework/batch/core/job/builder/JobBuilder.html" title="Class JobBuilder" target="_blank">JobBuilder</a> is the convenience class for building jobs of various kinds as you can see in the code above. We also use a <a href="http://docs.spring.io/spring-batch/apidocs/index.html?org/springframework/batch/core/Step.html" title="Interface Step" target="_blank">Step</a> that is configured as the next Spring bean.</li>
<li>Line 51: a <strong><a href="http://docs.spring.io/spring-batch/apidocs/index.html?org/springframework/batch/core/Step.html" title="Interface Step" target="_blank">Step</a> bean</strong>, that it’s built using the <a href="http://docs.spring.io/spring-batch/apidocs/index.html?org/springframework/batch/core/configuration/annotation/StepBuilderFactory.html" title="Class StepBuilderFactory" target="_blank">StepBuilderFactory</a> that is autowired by passing it as method parameter for this @Bean method, as well as the other dependencies: the <strong>reader</strong>, the <strong>processor</strong> and the <strong>writer</strong> previously defined. When calling the get method from the StepBuilderFactory, Spring Batch will create a step builder and will initialize its job repository and transaction manager, the <a href="http://docs.spring.io/spring-batch/apidocs/index.html?org/springframework/batch/core/step/builder/StepBuilder.html" title="Class StepBuilder" target="_blank">StepBuilder</a> is an entry point for building all kinds of steps as you can see in the code above.
</ul>
<p>This configuration is almost everything needed to configure a Batch process as defined in the concepts above.</p>
<p>Actually, only one configuration class needs to have the @EnableBatchProcessing annotation in order to have the base configuration for building batch jobs. Then you can define the job with their steps and the readers/processors/writers that they need.</p>
<p>But an additional data source is needed to be used by the <strong>JobRepository</strong>. For this sample we&#8217;ll use an in-memory one:</p>
<p></p><pre class="crayon-plain-tag">@Configuration
public class DataSourceConfiguration {

    @Bean
    public DataSource dataSource() {
        EmbeddedDatabaseBuilder builder = new EmbeddedDatabaseBuilder();
        return builder
                .setType(HSQL)
                .addScript(&quot;schema-all.sql&quot;)
                .addScript(&quot;org/springframework/batch/core/schema-hsqldb.sql&quot;)
                .build();
    }

}</pre><p></p>
<p>In this case we&#8217;ll use the same in-memory database, HSQL, with the schema for the application (line 9) and the schema for the job repository (line 10). The former is available as a resource of the application, the file called <a href="https://github.com/jbbarquero/spring-batch-sample/blob/master/src/main/resources/schema-all.sql" title="src/main/resources/schema-all.sql" target="_blank">schema-all.sql</a>, and the latter in the spring-batch-core jar (spring-batch-core-3.0.1.RELEASE.jar at the time of this writing)</p>
<h3>Alternate Configuration</h3>
<p>The official documentation shows an slightly different <a href="http://docs.spring.io/spring-batch/trunk/reference/htmlsingle/#javaConfig" title="Java Config" target="_blank">configuration</a> by using the <a href="http://docs.spring.io/spring/docs/4.0.6.RELEASE/javadoc-api/index.html?org/springframework/beans/factory/annotation/Autowired.html" title="Annotation Type Autowired" target="_blank">@Autowired</a> annotation for the beans that <a href="http://docs.spring.io/spring-batch/apidocs/index.html?org/springframework/batch/core/step/builder/StepBuilder.html" title="Annotation Type EnableBatchProcessing" target="_blank">@EnableBatchProcessing</a> will create. Use the one that you like most. In this case they also imports the data base configuration.</p>
<p></p><pre class="crayon-plain-tag">@Configuration
@EnableBatchProcessing
@Import(DataSourceConfiguration.class)
public class AppConfig {

    @Autowired
    private JobBuilderFactory jobs;

    @Autowired
    private StepBuilderFactory steps;

    // Input, processor, and output definition omitted

    @Bean
    public Job importUserJob() {
        return jobs.get(&quot;importUserJob&quot;).incrementer(new RunIdIncrementer()).flow(step1()).end().build();
    }

    @Bean
    protected Step step1(ItemReader&lt;Person&gt; reader, ItemProcessor&lt;Person, Person&gt; processor, ItemWriter&lt;Person&gt; writer) {
        return steps.get(&quot;step1&quot;)
            .&lt;Person, Person&gt; chunk(10)
            .reader(reader)
            .processor(processor)
            .writer(writer)
            .build();
    }

}</pre><p></p>
<p>We chose another approach: we load it when configuring the application in the main method as you&#8217;ll see shortly. besides, we imported an additional batch configuration (see line 28 at <a href="https://github.com/jbbarquero/spring-batch-sample/blob/master/src/main/java/com/malsolo/springframework/batch/sample/BatchConfiguration.java" title="BatchConfiguration.java" target="_blank">BatchConfiguration.java</a>) to provide an alternate way to launch the application.</p>
<h3>Enable Batch Processing: how it works</h3>
<p>As we said before, we will go a little deeper in how the annotation <a href="http://docs.spring.io/spring-batch/apidocs/index.html?org/springframework/batch/core/configuration/annotation/EnableBatchProcessing.html" title="EnableBatchProcessing API" target="_blank">@EnableBatchProcessing</a> works.</p>
<p>To remind its goal, this annotation provides a base configuration for building batch jobs by creating a list of beans available to be autowired. An extract of the source code gives us a lot of information:</p>
<p></p><pre class="crayon-plain-tag">@Target(ElementType.TYPE)
@Retention(RetentionPolicy.RUNTIME)
@Documented
@Import(BatchConfigurationSelector.class)
public @interface EnableBatchProcessing {

	/**
	 * Indicate whether the configuration is going to be modularized into multiple application contexts. If true then
	 * you should not create any &amp;#64;Bean Job definitions in this context, but rather supply them in separate (child)
	 * contexts through an {@link ApplicationContextFactory}.
	 */
	boolean modular() default false;

}</pre><p></p>
<p>As you can see at line 4, this annotation <a href="http://docs.spring.io/spring/docs/4.0.6.RELEASE/javadoc-api/index.html?org/springframework/context/annotation/Import.html" title="Annotation Type Import" target="_blank">imports</a> an implementation of an <a href="http://docs.spring.io/spring/docs/4.0.6.RELEASE/javadoc-api/index.html?org/springframework/context/annotation/ImportSelector.html" title="Interface ImportSelector" target="_blank">ImportSelector</a>, one of the options to import beans in a configuration class, in particular, to selective import beans according to certain criteria.</p>
<p>This particular implementation, <a href="http://docs.spring.io/spring-batch/apidocs/index.html?org/springframework/batch/core/configuration/annotation/BatchConfigurationSelector.html" title="Class BatchConfigurationSelector" target="_blank">BatchConfigurationSelector</a>, instantiates the expected beans for providing common structure for enabling and using Spring Batch based in the EnableBatchProcessing&#8217;s attribute <strong>modular</strong>.</p>
<p>There are two implementations depending on whether you want the configuration to be modularized into multiple application contexts so that they don’t interfere with each other with the naming and the uniqueness of beans (for instance, beans named <strong>reader</strong>) or not. They are <a href="http://docs.spring.io/spring-batch/apidocs/index.html?org/springframework/batch/core/configuration/annotation/ModularBatchConfiguration.html" title="Class ModularBatchConfiguration" target="_blank">ModularBatchConfiguration</a> and <a href="http://docs.spring.io/spring-batch/apidocs/index.html?org/springframework/batch/core/configuration/annotation/SimpleBatchConfiguration.html" title="Class SimpleBatchConfiguration" target="_blank">SimpleBatchConfiguration</a> respectively. Mainly they both do the same, but the former using an <a href="AutomaticJobRegistrar" title="Class AutomaticJobRegistrar" target="_blank">AutomaticJobRegistrar</a> which is responsible for creating separate <a href="http://docs.spring.io/spring/docs/current/javadoc-api/index.html?org/springframework/context/ApplicationContext.html" title="Interface ApplicationContext" target="_blank">ApplicationContext</a>s for register isolated jobs that are later accesible via the <a href="http://docs.spring.io/spring-batch/apidocs/index.html?org/springframework/batch/core/configuration/JobRegistry.html" title="nterface JobRegistry" target="_blank">JobRegistry</a>, and the latter just creates the main components as lazy proxies that only initialize when a method is called (in order to prevent configuration cycles)</p>
<p>The key concept here is that both extends <a href="http://docs.spring.io/spring-batch/apidocs/index.html?org/springframework/batch/core/configuration/annotation/AbstractBatchConfiguration.html" title="Class AbstractBatchConfiguration" target="_blank">AbstractBatchConfiguration</a> that uses the core interface for this configuration: <a href="http://docs.spring.io/spring-batch/apidocs/index.html?org/springframework/batch/core/configuration/annotation/BatchConfigurer.html" title="Interface BatchConfigurer" target="_blank">BatchConfigurer</a>.</p>
<p>The default implementation, <a href="http://docs.spring.io/spring-batch/apidocs/index.html?org/springframework/batch/core/configuration/annotation/DefaultBatchConfigurer.html" title="Class DefaultBatchConfigurer" target="_blank">DefaultBatchConfigurer</a>, provides the beans mentioned above (jobRepository, jobLauncher, jobRegistry, transactionManager, jobBuilders and stepBuilders), for doing so it <u>doesn&#8217;t require a dataSource</u>, it&#8217;s Autowired with required to false, so it will use a Map based JobRepository if its dataSource is null, but you have take care if you have a dataSource eligible for autowiring that doesn&#8217;t contain the expected database schema for the job repository: the batch process will fail in this case.</p>
<p>Spring Boot provides another implementation, <a href="http://docs.spring.io/spring-boot/docs/current/api/index.html?org/springframework/boot/autoconfigure/batch/BasicBatchConfigurer.html" title="Class BasicBatchConfigurer" target="_blank">BasicBatchConfigurer</a>, but this is out of the scope of this entry.</p>
<p>With all this information, we already have a Spring Batch application configured, and we more or less know how this configuration is achieved using Java.</p>
<p>Now it&#8217;s time to run the application. </p>
<h3>Running the sample: JobLauncher</h3>
<p>We have all we need to launch a batch job, the Job to be launched and a JobLauncher, so wait no more and execute this main class: <a href="https://github.com/jbbarquero/spring-batch-sample/blob/master/src/main/java/com/malsolo/springframework/batch/sample/MainJobLauncher.java" title="MainJobLauncher.java" target="_blank">MainJobLauncher</a>.</p>
<p></p><pre class="crayon-plain-tag">@Component
public class MainJobLauncher {

    @Autowired
    JobLauncher jobLauncher;

    @Autowired
    Job importUserJob;

    public static void main(String... args) throws JobParametersInvalidException, JobExecutionAlreadyRunningException, JobRestartException, JobInstanceAlreadyCompleteException {

        AnnotationConfigApplicationContext context = new AnnotationConfigApplicationContext(ApplicationConfiguration.class);

        MainJobLauncher main = context.getBean(MainJobLauncher.class);

        JobExecution jobExecution = main.jobLauncher.run(main.importUserJob, new JobParameters());

        MainHelper.reportResults(jobExecution);
        MainHelper.reportPeople(context.getBean(JdbcTemplate.class));

        context.close();

    }

}</pre><p></p>
<p>First things first. This is the way I like to write main classes. Some people from Spring are used to writing main classes annotated with <a href="http://docs.spring.io/spring/docs/current/javadoc-api/index.html?org/springframework/context/annotation/Configuration.html" title="Annotation Type Configuration" target="_blank">@Configuration</a>, but I&#8217;d rather to annotate them as <a href="http://docs.spring.io/spring/docs/current/javadoc-api/index.html?org/springframework/stereotype/Component.html" title="Annotation Type Component" target="_blank">@Component</a>s in order to separate the actual application and its configuration from the classes that test the functionality.</p>
<p>As Spring component (line 1), it only needs to have the dependencies <a href="http://docs.spring.io/spring/docs/current/javadoc-api/index.html?org/springframework/beans/factory/annotation/Autowired.html" title="Annotation Type Autowired" target="_blank">@Autowired</a>.</p>
<p>That&#8217;s the reason for the <a href="https://github.com/jbbarquero/spring-batch-sample/blob/master/src/main/java/com/malsolo/springframework/batch/sample/ApplicationConfiguration.java" title="ApplicationConfiguration.java" target="_blank">ApplicationConfiguration</a> class. It&#8217;s a <a href="http://docs.spring.io/spring/docs/current/javadoc-api/index.html?org/springframework/context/annotation/Configuration.html" title="Annotation Type Configuration" target="_blank">@Configuration</a> class that also performs a @ComponentScan from its own package, that will find the very <a href="https://github.com/jbbarquero/spring-batch-sample/blob/master/src/main/java/com/malsolo/springframework/batch/sample/MainJobLauncher.java" title="MainJobLauncher.java" target="_blank">MainJobLauncher</a> and the remain <a href="http://docs.spring.io/spring/docs/current/javadoc-api/index.html?org/springframework/context/annotation/Configuration.html" title="Annotation Type Configuration" target="_blank">@Configuration</a> classes, because they are also <a href="http://docs.spring.io/spring/docs/current/javadoc-api/index.html?org/springframework/stereotype/Component.html" title="Annotation Type Component" target="_blank">@Component</a>s: <a href="https://github.com/jbbarquero/spring-batch-sample/blob/master/src/main/java/com/malsolo/springframework/batch/sample/BatchConfiguration.java" title="BatchConfiguration.java" target="_blank">BatchConfiguration</a> and <a href="https://github.com/jbbarquero/spring-batch-sample/blob/master/src/main/java/com/malsolo/springframework/batch/sample/DataSourceConfiguration.java" title="DataSourceConfiguration.java" target="_blank">DataSourceConfiguration</a>.  </p>
<p>As a main class, it creates the Spring Application Context (line 12), it gets the component as a Spring bean (line 14) and then it uses its methods (or attributes in this example. Line 16)</p>
<p>Let&#8217;s back to the Batch application: the line 16 is the call to the JobLauncher that will run the Spring Batch process.</p>
<p>The remaining lines are intended to show the results, both from the job execution and the results in the database.</p>
<p>It will be something like this:</p>
<p></p><pre class="crayon-plain-tag">***********************************************************
importUserJob finished with a status of  (COMPLETED).
* Steps executed:
	step1 : exitCode=COMPLETED;exitDescription=
StepExecution: id=0, version=3, name=step1, status=COMPLETED, exitStatus=COMPLETED, readCount=5, filterCount=0, writeCount=5 readSkipCount=0, writeSkipCount=0, processSkipCount=0, commitCount=1, rollbackCount=0
***********************************************************

***********************************************************
* People found:

* Found firstName: JILL, lastName: DOE in the database

* Found firstName: JOE, lastName: DOE in the database

* Found firstName: JUSTIN, lastName: DOE in the database

* Found firstName: JANE, lastName: DOE in the database

* Found firstName: JOHN, lastName: DOE in the database
***********************************************************</pre><p></p>
<h3>Running the sample: CommandLineJobRunner</h3>
<p><a href="http://docs.spring.io/spring-batch/apidocs/index.html?org/springframework/batch/core/launch/support/CommandLineJobRunner.html" title="Class CommandLineJobRunner" target="_blank">CommandLineJobRunner</a> is a main class provided by Spring Batch as the primary entry point to <a href="http://docs.spring.io/spring-batch/trunk/reference/htmlsingle/#commandLineJobRunner" title="The CommandLineJobRunner" target="_blank">launch a Spring Batch Job</a>.</p>
<p>It requires at least two arguments: <strong>JobConfigurationXmlPath/JobConfigurationClassName</strong> and <strong>jobName</strong>. With the first, it will create an <a href="http://docs.spring.io/spring/docs/current/javadoc-api/index.html?org/springframework/context/ApplicationContext.html" title="Interface ApplicationContext" target="_blank">ApplicationContext</a> by loading a Java Configuration from a class with the same name or by loading an XML Configuration file with the same name.</p>
<p>It has a <a href="http://docs.spring.io/spring-batch/trunk/reference/htmlsingle/#domainJobLauncher" title="JobLauncher" target="_blank">JobLauncher</a> attribute that is autowired with the application context via its <a href="http://docs.spring.io/spring/docs/current/javadoc-api/index.html?org/springframework/beans/factory/config/AutowireCapableBeanFactory.html" title="Interface AutowireCapableBeanFactory" target="_blank">AutowireCapableBeanFactory</a> exposed, that is used to autowire the bean properties by type.</p>
<p>It accepts some options (&#8220;-restart&#8221;, &#8220;-next&#8221;, &#8220;-stop&#8221;, &#8220;-abandon&#8221;) as well as parameters for the <a href="http://docs.spring.io/spring-batch/trunk/reference/htmlsingle/#domainJobLauncher" title="JobLauncher" target="_blank">JobLauncher</a> that are converted with the <a href="http://docs.spring.io/spring-batch/apidocs/index.html?org/springframework/batch/core/converter/DefaultJobParametersConverter.html" title="Class DefaultJobParametersConverter" target="_blank">DefaultJobParametersConverter</a> as <a href="http://docs.spring.io/spring-batch/apidocs/index.html?org/springframework/batch/core/converter/JobParametersConverter.html" title="Interface JobParametersConverter" target="_blank">JobParametersConverter</a> that expects a &#8216;name=value&#8217; format.</p>
<p>You can declare this main class in the manifest file, directly or using some maven plugin as maven-jar-plugin, maven-shade-plugin or even exec-maven-plugin.</p>
<p>That is, you can invoke from your command line something like this:</p>
<p><strong>$ java CommandLineJobRunner job.xml jobName parameter=value</strong></p>
<p>Well, the sample code is a maven project that you can install (it&#8217;s enough if you package the application) and it allows to manage the dependencies (the mvn dependency:copy-dependencies command copies all the dependencies in the target/dependency directory)</p>
<p>To simplify, I&#8217;ll also copy the generated jar to the same directory of the dependencies in order to invoke the java command more easily:</p>
<p></p><pre class="crayon-plain-tag">~/Documents/git/spring-batch-sample$mvn clean install
[INFO] Scanning for projects...
...
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
...
~/Documents/git/spring-batch-sample$ mvn dependency:copy-dependencies
[INFO] Scanning for projects...
...
[INFO] Copying spring-batch-core-3.0.1.RELEASE.jar to ~/Documents/git/spring-batch-sample/target/dependency/spring-batch-core-3.0.1.RELEASE.jar
...
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
...
~/Documents/git/spring-batch-sample$ cp target/spring-batch-sample-0.0.1-SNAPSHOT.jar ./target/dependency/
~/Documents/git/spring-batch-sample$ java -classpath "./target/dependency/*" org.springframework.batch.core.launch.support.CommandLineJobRunner com.malsolo.springframework.batch.sample.ApplicationConfiguration importUserJob
...
12:32:17.039 [main] INFO  o.s.b.c.l.support.SimpleJobLauncher 
- Job: [FlowJob: [name=importUserJob]] 
completed with the following parameters: [{}] 
and the following status: [COMPLETED]
...</pre><p></p>
<p>That&#8217;s all for now.</p>
<p>Since this entry is becoming very large, I&#8217;ll explain other ways to run Spring Batch Jobs in a next post.</p>
]]></content:encoded>
			<wfw:commentRss>http://malsolo.com/blog4java/?feed=rss2&#038;p=260</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Try to avoid copy-paste. Fatal error compiling: invalid target release: 1.8.0_11</title>
		<link>http://malsolo.com/blog4java/?p=297</link>
		<comments>http://malsolo.com/blog4java/?p=297#comments</comments>
		<pubDate>Tue, 12 Aug 2014 11:35:15 +0000</pubDate>
		<dc:creator><![CDATA[Javier (@jbbarquero)]]></dc:creator>
				<category><![CDATA[Architecture]]></category>
		<category><![CDATA[Java]]></category>
		<category><![CDATA[Maven]]></category>
		<category><![CDATA[Compile]]></category>
		<category><![CDATA[Error]]></category>
		<category><![CDATA[Plugin]]></category>

		<guid isPermaLink="false">http://malsolo.com/blog4java/?p=297</guid>
		<description><![CDATA[Bazinga! Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) on project&#8230;: Fatal error compiling: invalid target release: 1.8.0_11 -&#62; [Help 1] To summarize, I want to compile a maven project using Java 8, that is possible thanks to the JAVA_HOME configuration: [crayon-69f5bc866009f237863941/] &#8230; <a href="http://malsolo.com/blog4java/?p=297">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p><strong>Bazinga!</strong></p>
<p><font color=red>Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) on project&#8230;: <strong>Fatal error compiling: invalid target release: 1.8.0_11</strong> -&gt; [Help 1]</font></p>
<p>To summarize, I want to compile a maven project using Java 8, that is possible thanks to the JAVA_HOME configuration:</p><pre class="crayon-plain-tag">$ mvn -version
Apache Maven 3.2.1 (ea8b2b07643dbb1b84b6d16e1f08391b666bc1e9; 2014-02-14T18:37:52+01:00)
Maven home: /home/jbeneito/Applications/apache-maven-3.2.1
Java version: 1.8.0_11, vendor: Oracle Corporation
Java home: /usr/lib/jvm/java-8-oracle/jre
Default locale: en_US, platform encoding: UTF-8
OS name: "linux", version: "3.13.0-32-generic", arch: "amd64", family: "unix"</pre><p></p>
<p>But I copied the maven compiler plugin configuration from another maven project without putting special attention:</p><pre class="crayon-plain-tag">&lt;plugin&gt;
                &lt;groupId&gt;org.apache.maven.plugins&lt;/groupId&gt;
                &lt;artifactId&gt;maven-compiler-plugin&lt;/artifactId&gt;
                &lt;version&gt;3.1&lt;/version&gt;
                &lt;configuration&gt;
                    &lt;source&gt;${java.version}&lt;/source&gt;
                    &lt;target&gt;${java.version}&lt;/target&gt;
                    &lt;encoding&gt;${project.build.sourceEncoding}&lt;/encoding&gt;
                &lt;/configuration&gt;
            &lt;/plugin&gt;</pre><p>In programming the error usually contains the solution. I&#8217;m trying to use the current Java version (1.8.0_11) as the option for the javac command, that is not the expected one.</p>
<p><strong><em>${java.version}</em></strong> is a environment variable whose value is currently <strong><em>1.8.0_11</em></strong>, but the <a href="http://maven.apache.org/plugins/maven-compiler-plugin/index.html" title="Maven compiler plugin" target="_blank">Maven Compiler Plugin</a> expects the standard options of the <a href="http://docs.oracle.com/javase/8/docs/technotes/tools/windows/javac.html" title="javac" target="_blank">javac command</a> for <a href="http://maven.apache.org/plugins/maven-compiler-plugin/examples/set-compiler-source-and-target.html" title="Setting the -source and -target of the Java Compiler" target="_blank">Setting the -source and -target of the Java Compiler</a>.</p>
<p>That is:</p>
<ul>
<li><strong>1.3</strong> for Java SE 1.3</li>
<li><strong>1.4</strong> for Java SE 1.4</li>
<li><strong>1.5</strong> for Java SE 5</li>
<li><strong>5</strong> synonym for 1.5</li>
<li><strong>1.6</strong> for Java SE 6</li>
<li><strong>6</strong> synonym for 1.6</li>
<li><strong>1.7</strong> the default value. The compiler accepts code with features introduced in Java SE 7. (Yes, as it appears in the <a href="http://docs.oracle.com/javase/8/docs/technotes/tools/windows/javac.html" title="javac 8" target="_blank">Java 8 official documentation</a>)</li>
<li><strong>7</strong> Synonym for 1.7</li>
</ul>
<p>Actually I only use the values with the format 1.X, that includes 1.8, that I assume is the default value for Java 8 SE.</p>
<p>Why did I use this variable?</p>
<p>Because in my previous project it wasn&#8217;t an environment variable, but a custom property in the pom.xml:</p>
<p></p><pre class="crayon-plain-tag">&lt;properties&gt;
        &lt;project.build.sourceEncoding&gt;UTF-8&lt;/project.build.sourceEncoding&gt;
        &lt;java.version&gt;1.8&lt;/java.version&gt;
...
    &lt;/properties&gt;</pre><p></p>
<p>But this time I have another two custom properties:</p><pre class="crayon-plain-tag">&lt;properties&gt;
        &lt;project.build.sourceEncoding&gt;UTF-8&lt;/project.build.sourceEncoding&gt;
        &lt;maven.compiler.source&gt;1.8&lt;/maven.compiler.source&gt;
        &lt;maven.compiler.target&gt;1.8&lt;/maven.compiler.target&gt;</pre><p></p>
<p>Thus the solution is easy:</p>
<ol>
<li>Try to not copy-paste</li>
<li>Pay attention in any case</li>
<li>Just use the appropriate variables</li>
</ol>
<p></p><pre class="crayon-plain-tag">&lt;plugin&gt;
                &lt;groupId&gt;org.apache.maven.plugins&lt;/groupId&gt;
                &lt;artifactId&gt;maven-compiler-plugin&lt;/artifactId&gt;
                &lt;version&gt;3.1&lt;/version&gt;
                &lt;configuration&gt;
                    &lt;source&gt;${maven.compiler.target}&lt;/source&gt;
                    &lt;target&gt;${maven.compiler.source}&lt;/target&gt;
                    &lt;encoding&gt;${project.build.sourceEncoding}&lt;/encoding&gt;
                &lt;/configuration&gt;
            &lt;/plugin&gt;</pre><p></p>
]]></content:encoded>
			<wfw:commentRss>http://malsolo.com/blog4java/?feed=rss2&#038;p=297</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Just for fun: trying the best 3 IDEs for Java</title>
		<link>http://malsolo.com/blog4java/?p=228</link>
		<comments>http://malsolo.com/blog4java/?p=228#comments</comments>
		<pubDate>Fri, 08 Aug 2014 12:52:45 +0000</pubDate>
		<dc:creator><![CDATA[Javier (@jbbarquero)]]></dc:creator>
				<category><![CDATA[Architecture]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[Springsource]]></category>
		<category><![CDATA[Eclipse]]></category>
		<category><![CDATA[IntelliJ IDEA]]></category>
		<category><![CDATA[Java]]></category>
		<category><![CDATA[NetBeans]]></category>
		<category><![CDATA[Spring Tool Suite]]></category>
		<category><![CDATA[STS]]></category>

		<guid isPermaLink="false">http://malsolo.com/blog4java/?p=228</guid>
		<description><![CDATA[Not only for fun. It&#8217;s also to open my mind. Introduction Since a few years ago, I try to use several OS in order to improve my computing skills: Windows 7/Vista, OS X (Lion currently) and my favorite Linux distribution, &#8230; <a href="http://malsolo.com/blog4java/?p=228">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p>Not only for fun. It&#8217;s also to open my mind.</p>
<h3>Introduction</h3>
<p>Since a few years ago, I try to use several OS in order to improve my computing skills: Windows 7/Vista, OS X (Lion currently) and my favorite Linux distribution, Ubuntu (Sheldon Cooper dixit). You can also include Android (my wife&#8217;s smartphone) and iOS (my iPhone and our iPad)</p>
<p>Suddenly, I realize that I&#8217;ve been enriched, and you’re able to do a lot of more things because you have your mind really prepared for thinking instead of conditioned by a single line of sight.</p>
<p>I observed the same improvement when I began to use Maven for Java development. I didn&#8217;t quit my IDE, but regarding compilation, the important things are the source code and bytecode. You shouldn&#8217;t think about settings, preferences, project configuration and some other artificial stuff that the IDE needs, but that is not your concern.</p>
<p>So I&#8217;ve decided to try the 3 most popular IDEs for Java currently available.</p>
<h3>Context</h3>
<p>During the last 2 months, I&#8217;ve been changing between projects I work in more quickly than I used to be accustomed, and that has allowed me to choose the IDE for working.</p>
<p>The title says it&#8217;s been for fun, but I&#8217;m using them in professional environment. Besides, the goal is not to decide which is better (it&#8217;s hard to be that rude with the others) or worst (well, it seems it&#8217;s an easy decision, because the worst Java IDE is <strong>JDeveloper</strong>, doubtless). What I want to find out is which of them is more productive for me.</p>
<p>Let&#8217;s see the first impressions.</p>
<h3>Eclipse</h3>
<p>Actually I don&#8217;t use Eclipse, but <a href="http://spring.io/tools/sts" title="Spring Tool Suite" target="_blank">Spring Tool Suite</a>, <em>&#8220;an Eclipse-based development environment that is customized for developing Spring applications&#8221;</em>. You know, Eclipse on Spring steroids.</p>
<p><u>I&#8217;ve been using Eclipse as my main IDE for years</u>, even before it was released (Visual Age for Java was the IDE that I used in my first professional project). Thus, I&#8217;m very used to writing programs with Eclipse. </p>
<p>The version that I&#8217;m currently using is 3.6.0, that is based on Eclipse 4.4 (Luna)</p>
<p>Let&#8217;s summarize my first impressions:</p>
<p>Pros:</p>
<ul>
<li>It comes in a compressed file, no native installer. That allows me to place the program in the same location on every OS (in my case, an Application folder within my user home)</li>
<li>The shortcut keys</li>
<li>Typing assistant. I love to type &#8220;main&#8221; and have an entire main method written, or &#8220;sysout&#8221; and obtain a complete log sentence.</li>
<li>The view and the perspectives is the most interesting way of organizing the IDE</li>
</ul>
<p>Cons:</p>
<ul>
<li>It crashes from time to time. Annoying.</li>
<li>It seems to consume a lot of memory</li>
<li>The TC server that it includes is based on Tomcat 7, so it doesn&#8217;t recognize Servlet 3.1 projects</li>
<li>A custom way of building code. No native support for tools like maven, gradle or ant. There are appropriate plugins, but having to update Eclipse projects to reflect maven changes is somewhat annoying</li>
<li>On Ubuntu and on OS X, the icon in the Docker runs the previously installed 3.5.1 version. Maybe is the OS fault, due to the lack of installation.
</ul>
<h3>NetBeans</h3>
<p>I&#8217;m talking about <a href="https://netbeans.org/" title="NetBeans 8.0" target="_blank">NetBeans 8.0</a>.</p>
<p>It&#8217;s hard for me to evaluate Eclipse, because it is the IDE that I use on a daily basis since a long years ago.</p>
<p>On the other hand, it&#8217;s very easy to describe the first impressions with NetBeans.</p>
<p>The installer works fine and the application runs smoothly, but I miss the way Eclipse are organized and its shortcuts (delete line: Ctrl/Cmd+D vs Ctrl/Cmd+E)</p>
<p>However <u>NetBeans is likely the best among the three IDEs</u> (not so hard to decide, apparently)</p>
<p>I got used to NetBeans in a few hours. Yes, hours. NetBeans has the more natural appearance for programming.</p>
<p>Actually, it includes all the options that Eclipse has, but sometimes with different key combination.</p>
<p>Besides, NetBeans is pure Java using external tools for compiling (maven, for instance)</p>
<p>Furthermore, it comes with Tomcat and GlassFish and a several of useful tools easy to configure with pluggins.</p>
<p>Does it have flaws? Of course, sometimes it get locked in a long wait operation and you can do nothing but wait.</p>
<p>Another tricky issue is that it uses an external tool to compile, maven for instance, so the program doesn&#8217;t exist until the project is built and neither you have the dependencies until maven downloads them. But worst than that is, if you compile successfully, but later on you introduce a compilation error, when you execute again, it runs the previously compiled version, because the current can&#8217;t be compiled due to the error.</p>
<p>Eclipse shows you an alert if there are compilation errors in your project before executing it.</p>
<h3>IntelliJ IDEA</h3>
<p>I&#8217;m sure that this is the best IDE ever, it seems so, but it makes me a little bit unproductive.</p>
<p>Eclipse is some kind of a de facto standard, and NetBeans is an easy to use program. But thanks to <a href="http://www.jetbrains.com/idea/" title="IntelliJ IDEA 13.1" target="_blank">IntelliJ IDEA 13.1</a>, the JetBrains Web Help has been the most visited page this week.</p>
<p>Besides, I don&#8217;t know what is doing the IDE. I don&#8217;t know whether the files are saved automatically, I don&#8217;t know when compilation is performed, and so on.</p>
<p>The worst thing in my humble opinion is that it doesn&#8217;t have any application server in the community edition. Please, at least Tomcat. But it has not.</p>
<p>Regarding shortcuts, it uses Ctrl/Cmd+Y for deleting lines, seriously?, the universal shortcut for “redo”? Yepes. And the rest of shortcut keys are really hard to do (Ctrl+F4 for closing a file)</p>
<p>I will try it later again, but currently it&#8217;s not my preferred choice.</p>
<p>However, once you discover how to use it, it&#8217;s the one that runs better. It seems not consume a lot of memory and it never blocks nor has crashes.</p>
<p>On Ubuntu, I have to start it via command line.</p>
<h3>Summary</h3>
<p>Of course the three programs are very good IDEs, but one by one:</p>
<ul>
<li>I feel a little bit lost with IntelliJ IDEA. It&#8217;s great but it&#8217;s hard to find the standard options.</li>
<li>I love Eclipse and its shortcut keys, and I don&#8217;t care its flaws because I&#8217;m used to them after all these years. It will be my preferred IDE for a while, but NetBeans is now here.</li>
<li>I didn&#8217;t like NetBeans, but the version 8 is a great program. It doesn&#8217;t matter if you haven&#8217;t use it before, is a very natural program. It makes Java development the way it has to be. Now I like it a lot.</li>
</ul>
<p>Some final words: they are all very good and easy to use for Java debugging. And I&#8217;m glad now I can use any of them if I had to.</p>
<p>An alternative for short projects is a text editor (Sublime Text, for instance) and a good build tool (maven or gradle, as you wish)</p>
<p>I have to try Web Based IDEs, <a href="http://nerds-central.blogspot.co.uk/2014/08/10-kickass-software-technologies.html" title="10 Kickass (software) Technologies - Gauntlet Picked Up" target="_blank">as Alexander J Turner suggested</a>.</p>
<p>That&#8217;s all for now.</p>
]]></content:encoded>
			<wfw:commentRss>http://malsolo.com/blog4java/?feed=rss2&#038;p=228</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Improving Java EE skills (including Spring)</title>
		<link>http://malsolo.com/blog4java/?p=79</link>
		<comments>http://malsolo.com/blog4java/?p=79#comments</comments>
		<pubDate>Fri, 18 Jul 2014 08:28:22 +0000</pubDate>
		<dc:creator><![CDATA[Javier (@jbbarquero)]]></dc:creator>
				<category><![CDATA[Java]]></category>
		<category><![CDATA[Personal]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[Springsource]]></category>
		<category><![CDATA[Concurrency]]></category>
		<category><![CDATA[Java EE]]></category>
		<category><![CDATA[Spring Framework]]></category>

		<guid isPermaLink="false">http://malsolo.com/blog4java/?p=79</guid>
		<description><![CDATA[A friend of mine has requested me some help for improving his skills in Java EE and Spring Framework. An exciting question, indeed. The general context He&#8217;s working for a company since 2008. To work for a company for a &#8230; <a href="http://malsolo.com/blog4java/?p=79">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p>A friend of mine has requested me some help for improving his skills in Java EE and Spring Framework.</p>
<p>An exciting question, indeed.</p>
<h4>The general context</h4>
<p>He&#8217;s working for a company since 2008. To work for a company for a long time has a lot of pros, but a couple of drawbacks, being one of them a little bit worrying: the chance of obsolescence.</p>
<p>If you&#8217;re using the same environment along the years, let&#8217;s say <a href="http://www-01.ibm.com/software/websphere/solutions/" title="IBM WebSphere" target="_blank">IBM products</a>, you&#8217;ll loose the chance to discover other approaches, for instance <a href="http://www.oracle.com/us/products/middleware/cloud-app-foundation/weblogic/suite/overview/index.html" title="Oracle WebLogic" target="_blank">Oracle solutions</a>, or even better, <a href="http://projects.apache.org/" title="The Apache Software Foundation Projects" target="_blank">open source projects</a>.</p>
<p>Even worst, if the company are not willing to upgrade their products (that is very reasonable if there are budget concerns) or their libraries (that is really unwise, or the direct consequence that you don&#8217;t have a good testing process) suddenly you find yourself that you&#8217;re out of the market.</p>
<p>This is not really a problem as long as your current technology is the best technology for your needs, but sooner or later you have to face to the <a href="http://en.wikipedia.org/wiki/Technical_debt" title="Technical debt from Wikipedia" target="_blank">technical debt</a> (a really <a href="http://martinfowler.com/bliki/TechnicalDebt.html" title="Martin Fowler's TechnicalDebt" target="_blank">interesting topic</a> that <a href="http://blog.codinghorror.com/paying-down-your-technical-debt/" title="Paying Down Your Technical Debt" target="_blank">you should care about</a>). This day will happen when you&#8217;re spending more time in fixing issues than in adding new features to your software.</p>
<p>The problem continues if you try to reinventing the wheel or if you don&#8217;t realize that the plane is already invented. It&#8217;s 21st century, you can fly with the appropriate machine.</p>
<h4>The particular concern</h4>
<p>As I said at the beginning of this entry, he is a <a href="http://www.oracle.com/technetwork/java/javaee/overview/index.html" title="Java EE Overview" target="_blank">Java EE</a> developer that uses <a href="https://spring.io/" title="Spring" target="_blank">Spring Framework</a> in his daily work. But he still has to program for <a href="http://www-01.ibm.com/support/docview.wss?uid=swg21570083" title="End of Support for WebSphere Application Server 6.1" target="_blank">WebSphere Application Server 6.1</a> (a.k.a. WAS 6.1, that is <a href="http://en.wikipedia.org/wiki/Java_EE_version_history#J2EE_1.4_.28November_11.2C_2003.29" title="Wikipedia J2EE 1.4" target="_blank">Java EE 1.4</a> compliance using <a href="http://www.oracle.com/technetwork/java/eol-135779.html" title="Oracle Java SE Support Roadmap" target="_blank">Java 5 SE</a>) using <a href="http://docs.spring.io/spring/docs/3.0.x/spring-framework-reference/html/" title="Spring Framework 3.0 Reference Documentation" target="_blank">Spring framework 3</a> (<a href="https://spring.io/blog/2007/11/19/spring-framework-2-5-released" title="Spring Framework 2.5 released" target="_blank">2.5</a> for some projects)</p>
<p>At the time of this writing, <a href="http://www.oracle.com/us/corporate/press/1957557" title="Oracle Press Release Java EE 7" target="_blank">Java EE 7</a> is already released and the current version of <a href="http://projects.spring.io/spring-framework/" title="Spring Framework" target="_blank">Spring Framework is 4.0.6</a>. Not mentioning that Java 8<a href="http://www.oracle.com/us/corporate/press/2172618" title="Oracle Announces Java 8" target="_blank"></a> is now with us.</p>
<p>Thus, he wants to get up to date for improving the way he writes code that will be profitable for the company he works for.</p>
<p>I appreciate his request, because I&#8217;ll have to review what I really know.</p>
<p>Well. Enough introduction. In coming posts I will write about my particular thoughts of what you can do to know Java a little better.</p>
<p>Next, a summary of the topics that I want to talk about:</p>
<h4>Suggested topics</h4>
<ul>
<li><a href="http://www.oracle.com/technetwork/java/javase/overview/index.html" title="Java SE at a Glance" target="_blank">Java SE</a></li>
<p>The Java EE platform is built on top of the Java SE platform. The Java EE platform provides a particular API for server programming. So,it&#8217;s very reasonable to have a good knowledge of Java SE.</p>
<p>In particular, I assume that you have basic knowledge of Java.</p>
<p>But if you want to improve, you need to know the new features that have been published during the last years:</p>
<ul>
<li><a href="http://docs.oracle.com/javase/1.5.0/docs/relnotes/features.html#lang" title="New Features and Enhancements J2SE 5.0" target="_blank">Java 5</a></li>
<p><a href="http://docs.oracle.com/javase/1.5.0/docs/guide/language/generics.html" title="Generics" target="_blank">Generics</a>, <a href="http://docs.oracle.com/javase/1.5.0/docs/guide/language/foreach.html" title="The For-Each Loop" target="_blank">enhanced loops</a>, <a href="http://docs.oracle.com/javase/1.5.0/docs/guide/language/autoboxing.html" title="Autoboxing" target="_blank">autoboxing</a>, <a href="http://docs.oracle.com/javase/1.5.0/docs/guide/language/enums.html" title="Enums" target="_blank">enums</a> (I love them), <a href="http://docs.oracle.com/javase/1.5.0/docs/guide/language/varargs.html" title="Varargs" target="_blank">varargs</a>, <a href="http://docs.oracle.com/javase/1.5.0/docs/guide/language/static-import.html" title="Static Import" target="_blank">static import</a>, and maybe the most important: <a href="http://docs.oracle.com/javase/1.5.0/docs/guide/language/annotations.html" title="Annotations" target="_blank">annotations</a> (they change our life as Java developers)</p>
<li><a href="http://www.oracle.com/technetwork/java/javase/features-141434.html" title="Highlights of Technology Changes in Java SE 6" target="_blank">Java 6</a></li>
<p><a href="http://www.onjava.com/pub/a/onjava/2006/08/02/jjdbc-4-enhancements-in-java-se-6.html" title="JDBC 4.0 Enhancements in Java SE 6" target="_blank">JDBC 4.0</a>, Support for the Web Services stack and XML processing, and <a href="http://www.oracle.com/technetwork/articles/javase/beta2-135158.html" title="What's New in Java SE 6" target="_blank">many more</a>. </p>
<li><a href="http://www.oracle.com/technetwork/java/javase/jdk7-relnotes-418459.html" title="Java SE 7 Features and Enhancements" target="_blank">Java 7</a></li>
<p>A quite <a href="http://radar.oreilly.com/2011/09/java7-features.html" title="A look at Java 7's new features" target="_blank">handy features for developing</a> like <a href="http://docs.oracle.com/javase/7/docs/technotes/guides/language/strings-switch.html" title="Strings in switch Statements" target="_blank">Strings in switch statements</a>, <a href="http://docs.oracle.com/javase/7/docs/technotes/guides/language/binary-literals.html" title="Binary Literals" target="_blank">Binary integral literals</a> and <a href="http://docs.oracle.com/javase/7/docs/technotes/guides/language/underscores-literals.html" title="Underscores in Numeric Literals" target="_blank">underscores in numeric literals</a>, <a href="http://docs.oracle.com/javase/7/docs/technotes/guides/language/catch-multiple.html" title="Catching Multiple Exception Types and Rethrowing Exceptions with Improved Type Checking" target="_blank">Multi-catch and more precise rethrow</a>, <a href="http://docs.oracle.com/javase/7/docs/technotes/guides/language/type-inference-generic-instance-creation.html" title="Type Inference for Generic Instance Creation" target="_blank">diamond operator</a>, <a href="http://docs.oracle.com/javase/7/docs/technotes/guides/language/try-with-resources.html" title="The try-with-resources Statement" target="_blank">try-with-resources</a>, <a href="http://docs.oracle.com/javase/7/docs/technotes/guides/language/non-reifiable-varargs.html" title="Improved Compiler Warnings and Errors When Using Non-Reifiable Formal Parameters with Varargs Methods" target="_blank">Simplified varargs method invocation</a>.</p>
</ul>
<p>As Java 8 has been released very recently, you can skip it for a little while.</p>
<li><a href="http://docs.oracle.com/javase/tutorial/essential/concurrency/" title="Java Concurrency" target="_blank">Java Concurrency</a></li>
<p>Why a specific section for Java SE Concurrency?</p>
<p>Because in Java SE 5 everything changed. Concurrency in Java was improved in a way that provides a more natural approach to multi-threading (I had great joy with two of the enhancement of Java 5: <a href="http://docs.oracle.com/javase/1.5.0/docs/guide/language/generics.html" title="Java Generics" target="_blank">Generics </a>and <a href="http://docs.oracle.com/javase/1.5.0/docs/api/java/util/concurrent/Callable.html" title="Java Callable" target="_blank">Callable</a>) and at the same time it avoids the long-term errors that the concurrency library had: the method <a href="http://docs.oracle.com/javase/8/docs/api/java/lang/Thread.html#stop--" title="Java Thread stop" target="_blank">Thread.stop()</a> (actually, it already exists, and <a href="http://docs.oracle.com/javase/8/docs/technotes/guides/concurrency/threadPrimitiveDeprecation.html" title="Why is Thread.stop deprecated?" target="_blank">I don&#8217;t know why</a>)</p>
<p>Not only you need to try not using Thread.start() anymore, try ExceutorService instead, you also need to know about Future and the new way to synchronize threads for understanding new features for concurrency of Java EE 7.</p>
<p>Besides, it&#8217;s really an improvement to know about <a href="http://docs.oracle.com/javase/tutorial/essential/concurrency/executors.html" title="Executors" target="_blank">Executor Framework</a> in Java.</p>
<p>There are a couple of good Java Concurrency tutorials:</p>
<ul>
<li><a href="http://www.vogella.com/tutorials/JavaConcurrency/article.html" title="Vogella's concurrency" target="_blank">Vogella&#8217;s Java concurrency (multi-threading) &#8211; Tutorial</a></li>
<li><a href="http://tutorials.jenkov.com/java-concurrency/index.html" title="Jenkov's Concurrency" target="_blank">Jenkov&#8217;s Java Concurrency / Multithreading Tutorial</a> and <a href="http://tutorials.jenkov.com/java-util-concurrent/index.html" title="Jenkov's concurrency utilities" target="_blank">Java Concurrency Utilities</a></li>
</ul>
<li><a href="http://www.oracle.com/technetwork/java/javaee/overview/index.html" title="Java EE at a Glance" target="_blank">Java EE 7</a></li>
<p>In spite of there are only a few <a href="http://www.oracle.com/technetwork/java/javaee/overview/compatibility-jsp-136984.html" title="Java EE Compatibility" target="_blank">Java EE 7 compatible servers</a>, it&#8217;s time to know about the <a href="http://www.infoworld.com/slideshow/105268/11-hot-improvements-java-ee-7-220465" title="11 hot improvements to Java EE 7" target="_blank">exciting new features</a> that will allow you to create better and faster applications:</p>
<ul>
<li>HTML5 (<a href="http://docs.oracle.com/javaee/7/tutorial/doc/websocket.htm#GKJIQ5" title="Java API for WebSocket" target="_blank">WebSockets</a> and <a href="http://docs.oracle.com/javaee/7/tutorial/doc/jsonp.htm#GLRBB" title="JSON Processing" target="_blank">JSON</a>)</li>
<li><a href="http://docs.oracle.com/javaee/7/tutorial/doc/partmessaging.htm#GFIRP3" title="Messaging" target="_blank">Simplified JMS 2.0 API</a></li>
<li><a href="http://docs.oracle.com/javaee/7/tutorial/doc/batch-processing.htm#GKJIQ6" title="Batch Processing" target="_blank">Batch applications</a></li>
<li><a href="http://docs.oracle.com/javaee/7/tutorial/doc/concurrency-utilities.htm" title="Concurrency Utilities for Java EE" target="_blank">Concurrency utilities</a> (at last <a href="https://jcp.org/en/jsr/detail?id=236" title="JSR 236: Concurrency Utilities for Java EE" target="_blank">JSR 236</a> is finished)</li>
<li><a href="http://docs.oracle.com/javaee/7/tutorial/doc/partcdi.htm#GJBNR" title="Contexts and Dependency Injection for Java EE" target="_blank">Context Dependency Injection</a> (CDI. Yeah, introduced in Java EE 6, it has been enhanced to compete with Spring&#8217;s @Autowired)</li>
<li>Java API for RESTful Web Services (JAX-RS) 2.0 (<a href="http://docs.oracle.com/javaee/7/tutorial/doc/jaxrs.htm#GIEPU" title="RESTful" target="_blank">webservices</a>, <a href="http://docs.oracle.com/javaee/7/tutorial/doc/jaxrs-client.htm#BABEIGIH" title="Accesing REST" target="_blank">clients</a> and <a href="http://docs.oracle.com/javaee/7/tutorial/doc/jaxrs-advanced.htm#GJJXE" title="JAX-RS: Advanced Topics and an Example" target="_blank">more</a>)</li>
<li><a href="http://docs.oracle.com/javaee/7/tutorial/doc/servlets.htm#BNAFD" title="Java Servlet TEchnology" target="_blank">Servlet 3.1</a></li>
</ul>
<li><a href="https://spring.io/projects" title="Spring projects" target="_blank">Spring Projects</a></li>
<p>I want to create particular blog entries for each Spring project that I&#8217;ve worked with, but in the meantime, the best way to getting started would be to take a look to the <a href="https://spring.io/guides" title="Spring guides" target="_blank">guides</a> that they provide.</p>
<li>Other technologies good to know</li>
<p>Since you&#8217;re using Java for writing programs to build solutions for business, it&#8217;s good to know the new environments that provides modern approaches for the current challenges.</p>
<p>You have to take a look to <a href="http://en.wikipedia.org/wiki/NoSQL" title="NoSQL from Wikipedia" target="_blank">NoSQL</a>, being <a href="http://www.mongodb.org/" title="mongoDB" target="_blank">MongoDB</a> the most popular, with good <a href="http://docs.mongodb.org/manual/" title="The MongoDB 2.6 Manual" target="_blank">documentation</a>, including <a href="http://docs.mongodb.org/ecosystem/drivers/java/" title="Java MongoDB Driver" target="_blank">programming with Java</a>.</p>
<p>The next step should be <a href="http://en.wikipedia.org/wiki/Big_data" title="Big data from Wikipedia" target="_blank">Big Data</a>. <a href="http://hadoop.apache.org/" title="Apache Hadoop" target="_blank">Hadoop</a> is the project that you have to pay attention for, but it&#8217;s so big, with several related projects, and so complex, that I still haven&#8217;t found a good introductory tutorial.</p>
<p>Finally, <a href="http://en.wikipedia.org/wiki/Asynchronous_I/O" title="Asynchronous I/O from Wikipedia" target="_blank">Asynchronous I/O</a>, that is the new solution for a problem that has been created by the new economy based on the internet of the things, that is, the need of handle scaling transactions from hundreds, thousands, even millions of users and the requirement of high-performance, high-speed for these operations.</p>
<p>There are several new frameworks that are worth it: <a href="http://akka.io/" title="Akka" target="_blank">Akka</a>, <a href="http://malsolo.com/blog4java/?p=35" title="Getting started with Vert.x" target="_blank">Vert.x</a>, <a href="https://github.com/reactor/reactor" title="Reactor" target="_blank">Reactor</a> (<a href="https://spring.io/blog/2013/05/13/reactor-a-foundation-for-asynchronous-applications-on-the-jvm" title="Reactor" target="_blank">a foundational framework</a> for <a href="https://spring.io/blog/2013/07/18/reactor-1-0-0-m1-a-foundation-for-asynchronous-fast-data-applications-on-the-jvm" title="Reactor 1.0.0.M1" target="_blank">asynchronous applications on the JVM</a>, <a href="https://spring.io/blog/2014/05/06/reactor-1-1-0-release-now-available" title="Reactor 1.1.0" target="_blank">by Spring</a>, but it&#8217;s not part of the portfolio) and <a href="http://nodejs.org/" title="Node.js" target="_blank">Node.js</a> (yes, it&#8217;s JavaScript)</p>
</ul>
<p>That&#8217;s all for now. I hope my colleague finds this entry interesting and I wish it is the first of a series of articles.</p>
<p>Let&#8217;s see.</p>
]]></content:encoded>
			<wfw:commentRss>http://malsolo.com/blog4java/?feed=rss2&#038;p=79</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Getting started with Vert.x</title>
		<link>http://malsolo.com/blog4java/?p=35</link>
		<comments>http://malsolo.com/blog4java/?p=35#comments</comments>
		<pubDate>Tue, 08 Jul 2014 13:29:29 +0000</pubDate>
		<dc:creator><![CDATA[Javier (@jbbarquero)]]></dc:creator>
				<category><![CDATA[Java]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[Asynchronous I/O]]></category>
		<category><![CDATA[Vert.x]]></category>

		<guid isPermaLink="false">http://malsolo.com/blog4java/?p=35</guid>
		<description><![CDATA[What is Vert.x? In my humble opinion, Vert.x is a poorly documented platform for creating server applications intended to be scalable by using an event-driven, non-blocking I/O in the JVM. The first definition for Vert.x I heard was &#8220;it&#8217;s like &#8230; <a href="http://malsolo.com/blog4java/?p=35">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<h4>What is Vert.x?</h4>
<p>In my humble opinion, Vert.x is a poorly documented platform for creating server applications intended to be scalable by using an event-driven, non-blocking I/O in the JVM.</p>
<p>The first definition for Vert.x I heard was <em>&#8220;it&#8217;s like Node.js but in the JVM&#8221;</em>. (Yeah!,  part of my first definition has been copied from the description that can be found at <a href="http://nodejs.org/" title="Node.js" target="_blank">Node.js</a>)</p>
<p>OK! Let&#8217;s take a look to the <a title="Vert.x" href="http://vertx.io/" target="_blank">Vert.x site</a>:</p>
<blockquote><p>&#8220;Vert.x is a lightweight, high performance application platform for the JVM that&#8217;s designed for modern mobile, web, and enterprise applications.&#8221;</p></blockquote>
<p>Terrific! Furthermore:</p>
<ul>
<li>It&#8217;s <strong>polyglot</strong>: you can write your application components in Java, JavaScript, CoffeeScript, Ruby, Python or Groovy. And you can even mix all these languages. Sincerely, I don&#8217;t find this feature exciting.</li>
<li>It has a <strong>simple</strong> APIs for writing non-blocking network enabled applications.</li>
<li>It&#8217;s <strong>scalable</strong> because it uses non blocking I/O to serve many connections with minimal threads plus passing messages to handle the logic of the application.</li>
<li>It provides a simple actor-like <strong>concurrency</strong> model, so that you don&#8217;t have to worry about multi-threaded programming anymore.</li>
</ul>
<p>Well! I still miss something here.</p>
<p>Taking a look to the <strong><em>Key Features</em></strong> you&#8217;ll learn that Vert.x has an <strong>Event Bus</strong>, a kind of Queue that will use the Vert.x components, called <strong>Verticles</strong>, to communicate between them regardless their programming language. It uses WebSockets and SockJS to achieve the JavaScript penetration that they claim.</p>
<p>Vert.x is a platform that you invoke from the command line, <code>vert.x run</code> for single Verticles or <code>vert.x runmod</code> for the encapsulation system that Vert.x provides: the <strong>module system</strong> (that can be shared via <a title="Maven Central Repository" href="http://mvnrepository.com/" target="_blank">Maven repository</a> or <a title="Bintray" href="https://bintray.com/" target="_blank">Bintray</a> and can be registered in the <a title="Vert.x Module Registry" href="http://modulereg.vertx.io/" target="_blank">module registry</a>). But it also can be embedded in a Java application.</p>
<h4>Install Vert.x</h4>
<p>Installing Vert.x is very easy:</p>
<ol>
<li>You need a Linux, OS X or Windows with JDK 1.7.0 or later installed (try <code>javac -version</code> in order to ensure that you have the JDK bin directory on your <code>PATH</code>).</li>
<li>Download the latest release of Vert.x, <a title="Vert.x Downloads" href="http://vertx.io/downloads.html" target="_blank">2.1.1 at the time of this writing</a>.</li>
<li>Decompress the download file. I like to have an Applications directory in my home directory, so: <code>tar -zxf ~/Applications/vert.x-2.1.1.tar.gz</code> will work right.</li>
<li>Add the Vert.x bin directory to your PATH environment variable.</li>
</ol>
<p>Now you can check the version:</p><pre class="crayon-plain-tag">$ vertx version
2.1.1 (built 2014-06-18 14:11:03)</pre><p></p>
<h4>The first example</h4>
<p>Testing the install is as easy as write a simple web server. This example will show the main features of Vert.x: simplicity, scalability, concurrency. With a few more examples, the polyglot would be showed as well.</p>
<p>Copy the following into a text editor and save it as <strong>Server.java</strong></p><pre class="crayon-plain-tag">public class Main {
import org.vertx.java.core.Handler;
import org.vertx.java.core.http.HttpServerRequest;
import org.vertx.java.platform.Verticle;

public class Server extends Verticle {

  public void start() {
    vertx.createHttpServer().requestHandler(new Handler&lt;HttpServerRequest&gt;() {
      public void handle(HttpServerRequest req) {
        //String file = req.path().equals(&quot;/&quot;) ? &quot;index.html&quot; : req.path();
        //req.response().sendFile(&quot;webroot/&quot; + file);
        req.response().end(&quot;Hello World!&quot;);
      }
    }).listen(8080);
  }
}</pre><p>Now run this Verticle (more on the Vert.x concepts later) by opening a console in the directory where you saved the file, and typing:</p><pre class="crayon-plain-tag">~/Documents/vert.x$ vertx run Server.java
Succeeded in deploying verticle</pre><p>To ensure that you have really succeeded with this verticle, open a web browser and go to http://localhost:8080 (see above the highlighted line 15, if you want to change the port number)</p>
<p>You have to see &#8220;Hello World!&#8221; (without the quotes)</p>
<div id="attachment_68" style="width: 310px" class="wp-caption alignnone"><a href="http://malsolo.com/blog4java/wp-content/uploads/2014/07/vertxhello.png"><img class="size-medium wp-image-68" src="http://malsolo.com/blog4java/wp-content/uploads/2014/07/vertxhello-300x141.png" alt="Hello world from Vert.x" width="300" height="141" /></a><p class="wp-caption-text">Hello vert.x</p></div>
<p>Now you can stop the server by using Ctrl+C (Command-C in OS X)</p><pre class="crayon-plain-tag">^C
~/Documents/vert.x$</pre><p></p>
<p>Regarding polyglot, you can find more or less the same sample at <a href="http://vertx.io/" title="Vert.x site" target="_blank"></a> in <a href="http://vertx.io/#ws_js" title="JavaScript" target="_blank">JavaScript</a>, <a href="http://vertx.io/#ws_ruby" title="Ruby" target="_blank">Ruby</a>, <a href="http://vertx.io/#ws_groovy" title="Groovy" target="_blank">Groovy</a>, <a href="http://vertx.io/#ws_python" title="Python" target="_blank">Python</a> and <a href="http://vertx.io/#ws_clojure" title="Clojure" target="_blank">Clojure</a></p>
<p>You can see with this example another feature of Vert.x: you don’t need to compile Java code. I suppose that the authors wanted to give the same facilities that the other languages have, but they are script languages. I don’t find interesting this option, and it’s totally useless for big projects, that is better to be distributed as modules.</p>
<h4>Next steps</h4>
<p>We have seen almost nothing of Vert.x but the installation and a dumb test. In later posts, I will explain the Core Concepts and I will develop a Java Vert.x application using maven.</p>
]]></content:encoded>
			<wfw:commentRss>http://malsolo.com/blog4java/?feed=rss2&#038;p=35</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
