Technical Requirements
- Hadoop Ecosystem
- Option 1: Hortonworks Virtual Machine (local or cloud installation)
- Option 2: HDInsight Emulator (windows only)
- Java JDK 7 or later
- Hadoop HDFS commands
FAQ
- I get “AccessControlException: Permission denied: user=root, access=WRITE” when running Hadoop
Create a folder in HDFS and set your user as the owner (explanation) $ su hdfs hadoop fs -mkdir /user/root $ su hdfs hadoop fs -chown root /user/root
- Hadoop does not found my Mapper/Reducer classes (ClassNotFoundException)
Set the path to the jar containing your main class $ export HADOOP_CLASSPATH=/path/to/jar/myjar.jar:$HADOOP_CLASSPATH
- I get “ImportError” when importing the module numpy in Spark pyshell
Install easy_install and pip. Then install numpy $ wget https://bitbucket.org/pypa/setuptools\/raw/bootstrap/ez_setup.py $ python ez_setup.py $ easy_install pip $ pip install numpy
- sbt is not installed in Hortonworks
Install it as follows $ curl https://bintray.com/sbt/rpm/rpm | \ sudo tee /etc/yum.repos.d/bintray-sbt-rpm.repo $ sudo yum install sbt $ sbt -v sbtVersion # Takes a lot of time
- How to compile & execute Spark Streaming programs (written in scala)
Execute the following commands inside the project template $ sbt assembly # Compiles and builds the jar for spark $ spark-submit --class Tutorial target/scala-2.10/Tutorial.jar
- How to reduce spark’ level of verbosity
Replace the content of /etc/spark/2.3.2.0-2950/0/log4j.properties with: # Set everything to be logged to the console log4j.rootCategory=WARN, console log4j.appender.console=org.apache.log4j.ConsoleAppender log4j.appender.console.target=System.err log4j.appender.console.layout=org.apache.log4j.PatternLayout log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n # Settings to quiet third party logs that are too verbose log4j.logger.org.eclipse.jetty=WARN log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=INFO log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=INFO
- I get “ERROR 401: Authentication credentials” when executing the twitter streaming example
1. Ensure that you have set valid consumer key/secret, access token/secret 2. Synchronize your system clock (Hortonworks) $ yum install ntp $ sudo ntpdate ntp.ubuntu.com