AccountIcon BigDataIcon BlogIcon default_resource_icon CartIcon checkmark_icon cloud_devops_icon computer_network_admin_icon cyber_security_icon gsa_schedule_icon human_resources_icon location_icon phone_icon plus_icon programming_software_icon project_management_icon redhat_linux_icon search_icon sonography_icon sql_database_icon webinar_icon

Search UMBC Training Centers

Big Data Analytics

Hortonworks HDP Developer: Java

+ View more dates & times
  • Overview

    This advanced four-day course provides Java programmers a deep-dive into Hadoop 2.0 application development. Students will learn how to design and develop efficient and effective MapReduce applications for Hadoop 2.0 using the Hortonworks Data Platform. Students who attend this course will learn how to harness the power of Hadoop 2.0 to manipulate, analyze and perform computations on their Big Data.

    Upon completion of this course, students will be able to:

    • Explain Hadoop 2.0 and the Hadoop Distributed File System
    • Explain the new YARN framework in Hadoop 2.0
    • Develop a Java MapReduce application
    • Run a MapReduce application on YARN
    • Use combiners and in-map aggregation to improve the performance of a MapReduce job
    • Write a custom partitioner to avoid data skew on reducers
    • Perform a secondary sort by writing custom key and group comparator classes
    • Recognize use cases for the various built-in input and output formats
    • Write a custom input and output format for a MapReduce job.
    • Optimize a MapReduce job by following best practices
    • Configure various aspects of a MapReduce job to optimize mappers and reducers
    • Develop a custom RawComparator class
    • Use the Distributed Cache
    • Explain the various join techniques in Hadoop
    • Perform a map-side join
    • Use a Bloom filter to join two large datasets
    • Perform unit tests using the UnitMR API
    • Explain the basic architecture of HBase
    • Write an HBase MapReduce application
    • Explain use cases for Pig and Hive
    • Write a simple Pig script to explore and transform big data
    • Write a Pig UDF (User-Defined Function) in Java
    • Execute a Hive query
    • Write a Hive UDF in Java Use the JobControl class to create a workflow of MapReduce jobs
    • Use Oozie to define and schedule workflows
  • Who Should Take This Course

    WHO SHOULD ATTEND

    This class is for experienced Java software engineers who need to design and develop Java MapReduce applications for Hadoop 2.0.

    PREREQUISITES

    This course assumes students have experience developing Java applications and using a Java IDE. Labs are completed using the Eclipse IDE and Maven. No prior Hadoop knowledge is required.

  • Schedule
  • Course Outline

    Day 1

    1. Understanding Hadoop and HDFS
    2. Writing MapReduce Applications
    3. Map Aggregation

    Day 2

    1. Partitioning and Sorting
    2. Input and Output Formats
    3. Optimizing MapReduce Jobs

    Day 3

    1. Advanced MapReduce Features
    2. Unit Testing
    3. HBase Programming

    Day 4

    1. Pig Programming
    2. Hive Programming
    3. Defining Workflow

    Lab Content

    1. Configuring a Hadoop 2.0 Development Environment
    2. Putting data into HDFS using Java
    3. Write a distributed grep MapReduce application
    4. Write an inverted index MapReduce application
    5. Configure and use a combiner
    6. Writing a custom combiner
    7. Writing a custom partitioner
    8. Globally sort output using the
    9. TotalOrderPartitioner
    10. Writing a MapReduce job whose data is sorted using a composite key
    11. Writing a custom InputFormat class
    12. Writing a custom OutputFormat class
    13. Compute a simple moving average of historical stock price data
    14. Use data compression
    15. Define a RawComparator
    16. Perform a map-side join
    17. Using a Bloom filter
    18. Unit testing a MapReduce job
    19. Import data into HBase
    20. Writing an HBase MapReduce job
    21. Writing a User-Defined Pig Function
    22. Writing a User-Defined Hive Function
    23. Defining an Oozie workflow
  • FAQs
    Is there a discount available for current students?

    UMBC students and alumni, as well as students who have previously taken a public training course with UMBC Training Centers are eligible for a 10% discount, capped at $250. Please provide a copy of your UMBC student ID or an unofficial transcript or the name of the UMBC Training Centers course you have completed. Online courses are excluded from this offer.

    What is the cancellation and refund policy?

    Student will receive a refund of paid registration fees only if UMBC Training Centers receives a notice of cancellation at least 10 business days prior to the class start date for classes or the exam date for exams.

Contact Us