Hortonworks HDP Developer: Java
This advanced four-day course provides Java programmers a deep-dive into Hadoop 2.0 application development. Students will learn how to design and develop efficient and effective MapReduce applications for Hadoop 2.0 using the Hortonworks Data Platform. Students who attend this course will learn how to harness the power of Hadoop 2.0 to manipulate, analyze and perform computations on their Big Data.
Upon completion of this course, students will be able to:
- Explain Hadoop 2.0 and the Hadoop Distributed File System
- Explain the new YARN framework in Hadoop 2.0
- Develop a Java MapReduce application
- Run a MapReduce application on YARN
- Use combiners and in-map aggregation to improve the performance of a MapReduce job
- Write a custom partitioner to avoid data skew on reducers
- Perform a secondary sort by writing custom key and group comparator classes
- Recognize use cases for the various built-in input and output formats
- Write a custom input and output format for a MapReduce job.
- Optimize a MapReduce job by following best practices
- Configure various aspects of a MapReduce job to optimize mappers and reducers
- Develop a custom RawComparator class
- Use the Distributed Cache
- Explain the various join techniques in Hadoop
- Perform a map-side join
- Use a Bloom filter to join two large datasets
- Perform unit tests using the UnitMR API
- Explain the basic architecture of HBase
- Write an HBase MapReduce application
- Explain use cases for Pig and Hive
- Write a simple Pig script to explore and transform big data
- Write a Pig UDF (User-Defined Function) in Java
- Execute a Hive query
- Write a Hive UDF in Java Use the JobControl class to create a workflow of MapReduce jobs
- Use Oozie to define and schedule workflows
WHO SHOULD ATTEND
This class is for experienced Java software engineers who need to design and develop Java MapReduce applications for Hadoop 2.0.
PREREQUISITES
This course assumes students have experience developing Java applications and using a Java IDE. Labs are completed using the Eclipse IDE and Maven. No prior Hadoop knowledge is required.
Day 1
- Understanding Hadoop and HDFS
- Writing MapReduce Applications
- Map Aggregation
Day 2
- Partitioning and Sorting
- Input and Output Formats
- Optimizing MapReduce Jobs
Day 3
- Advanced MapReduce Features
- Unit Testing
- HBase Programming
Day 4
- Pig Programming
- Hive Programming
- Defining Workflow
Lab Content
- Configuring a Hadoop 2.0 Development Environment
- Putting data into HDFS using Java
- Write a distributed grep MapReduce application
- Write an inverted index MapReduce application
- Configure and use a combiner
- Writing a custom combiner
- Writing a custom partitioner
- Globally sort output using the
- TotalOrderPartitioner
- Writing a MapReduce job whose data is sorted using a composite key
- Writing a custom InputFormat class
- Writing a custom OutputFormat class
- Compute a simple moving average of historical stock price data
- Use data compression
- Define a RawComparator
- Perform a map-side join
- Using a Bloom filter
- Unit testing a MapReduce job
- Import data into HBase
- Writing an HBase MapReduce job
- Writing a User-Defined Pig Function
- Writing a User-Defined Hive Function
- Defining an Oozie workflow
Is there a discount available for current students?
UMBC students and alumni, as well as students who have previously taken a public training course with UMBC Training Centers are eligible for a 10% discount, capped at $250. Please provide a copy of your UMBC student ID or an unofficial transcript or the name of the UMBC Training Centers course you have completed. Online courses are excluded from this offer.
What is the cancellation and refund policy?
Student will receive a refund of paid registration fees only if UMBC Training Centers receives a notice of cancellation at least 10 business days prior to the class start date for classes or the exam date for exams.