The hadoop-site.xml file must also be modified to contain a number of configuration settings.
It is recommended that you install Java in the same location on all machines in the cluster, so this file can be replicated to each machine without modification. The JAVA_HOME variable must be set to the base directory of your Java installation. The most interesting of these are bin/, where scripts to run the cluster are located, and conf/ where the cluster's configuration is stored.Įnter the conf/ directory and modify hadoop-env.sh. Within the hadoop-0.18.0/ directory which results, there will be several subdirectories.
Then download a Hadoop version using a web browser, wget, or curl, and then unzip the package: Windows users must install and configure cygwin as well. Distributed operation requires ssh and sshd. To install Hadoop, first download and install prerequisite software. These example instructions assume that version 0.18.0 is being used the directions will not change significantly for any other version, except by substituting the new version number where appropriate. The most recent minor version may include improved performance or new features, but may also introduce regressions that will be fixed in ensuing revisions.Īt the time of this writing, 0.18.0 is the most recent version, with 0.17.2 being the "stable" release. Production clusters should use this version.
The stable version is the highest revision number in the second most recent minor version. Within the releases page, two or three versions of Hadoop will be readily available, corresponding to the highest revision number in the most recent two or three minor version increments. Within a minor version, the most recent revision contains the most stable patches.
Hadoop instances with different minor versions may use different versions of the HDFS file formats and protocols, requiring a DFS upgrade to migrate from one to the next. The minor version represents a large set of feature improvements and enhancements. At the time of this writing (September 2008), there have been no major upgrades all Hadoop versions have their major version set to 0. Increments to the major version number represent large differences in operation or interface and possibly significant incompatible changes. Here you will find several versions of Hadoop available.
Hadoop is available for download from the project homepage at. Some Hadoop users have reported successfully running the system on Solaris.) The instructions on this page assume a command syntax and system design similar to Linux, but can be readily adapted to other systems. (Other POSIX-style operating systems such as BSD may also work. The vast majority of server deployments today are on Linux. The Hadoop documentation stresses that a Windows/cygwin installation is for development only. Thus running Hadoop under Windows requires cygwin to be installed. The various scripts used to manage Hadoop clusters are written in a UNIX shell scripting language that assumes sh- or bash-like behavior. Developers can and do run Hadoop under Windows.
OPERATING SYSTEMĪs Hadoop is written in Java, it is mostly portable between different operating systems. A bug in gcj, the GNU Compiler for Java, causes incompatibility between generated classes and Hadoop it should not be used. Sun's compiler is fine, as is ecj, the Eclipse Compiler for Java. Recent versions of Hadoop require Sun Java 1.6.Ĭompiling Java programs to run on Hadoop can be done with any number of commonly-used Java compilers. This section discusses the general platform requirements for Hadoop.