Designing and Building Big Data Applications

Code: 3910

4 days

List Tuition : $3,295.00 USD

Course Overview

    Download PDF 

In this course, you will work through the entire process of designing and building solutions, including ingesting data, determining the appropriate file format for storage, processing the stored data, and presenting the results to the end-user in an easy-to-digest format. Go beyond MapReduce to use additional elements of the enterprise data hub and develop converged applications that are highly relevant to the business.

Developers, engineers, and architects who want to use Hadoop and related tools to solve real-world problems

  • Create a data set with Kite SDK
  • Develop custom Flume components for data ingestion
  • Manage a multi-stage workflow with Oozie
  • Analyze data with Crunch
  • Write user-defined functions for Hive and Impala
  • Transform data with Morphlines
  • Index data with Cloudera Search

1. Application Architecture

  • Scenario Explanation
  • Understanding the Development Environment
  • Identifying and Collecting Input Data
  • Selecting Tools for Data Processing and Analysis
  • Presenting Results to the User

2. Defining and Using Data Sets

  • Metadata Management
  • What is Apache Avro?
  • Avro Schemas
  • Avro Schema Evolution
  • Selecting a File Format
  • Performance Considerations

3. Using the Kite SDK Data Module

  • What is the Kite SDK?
  • Fundamental Data Module Concepts
  • Creating New Data Sets Using the Kite SDK
  • Loading, Accessing, and Deleting a Data Set

4. Importing Relational Data with Apache Sqoop

  • What is Apache Sqoop?
  • Basic Imports
  • Limiting Results
  • Improving Sqoop's Performance
  • Sqoop 2

5. Capturing Data with Apache Flume

  • What is Apache Flume?
  • Basic Flume Architecture
  • Flume Sources
  • Flume Sinks
  • Flume Configuration
  • Logging Application Events to Hadoop

6. Developing Custom Flume Components

  • Flume Data Flow and Common Extension Points
  • Custom Flume Sources
  • Developing a Flume Pollable Source
  • Developing a Flume Event-Driven Source
  • Custom Flume Interceptors
  • Developing a Header-Modifying Flume Interceptor
  • Developing a Filtering Flume Interceptor
  • Writing Avro Objects with a Custom Flume Interceptor

7. Managing Workflows with Apache Oozie

  • The Need for Workflow Management
  • What is Apache Oozie?
  • Defining an Oozie Workflow
  • Validation, Packaging, and Deployment
  • Running and Tracking Workflows Using the CLI
  • Hue UI for Oozie

8. Processing Data Pipelines with Apache Crunch

  • What is Apache Crunch?
  • Understanding the Crunch Pipeline
  • Comparing Crunch to Java MapReduce
  • Working with Crunch Projects
  • Reading and Writing Data in Crunch
  • Data Collection API
  • Functions
  • Utility Classes in the Crunch API

9. Working with Tables in Apache Hive

  • What is Apache Hive?
  • Accessing Hive
  • Basic Query Syntax
  • Creating and Populating Hive Tables
  • How Hive Reads Data
  • Using the RegexSerDe in Hive

10. Developing User-Defined Functions

  • What are User-Defined Functions?
  • Implementing a User-Defined Function
  • Deploying Custom Libraries in Hive
  • Registering a User-Defined Function in Hive

11. Executing Interactive Queries with Impala

  • What is Impala?
  • Comparing Hive to Impala
  • Running Queries in Impala
  • Support for User-Defined Functions
  • Data and Metadata Management

12. Understanding Cloudera Search

  • What is Cloudera Search?
  • Search Architecture
  • Supported Document Formats

13. Indexing Data with Cloudera Search

  • Collection and Schema Management
  • Morphlines
  • Indexing Data in Batch Mode
  • Indexing Data in Near Real Time

14. Presenting Results to Users

  • Solr Query Syntax
  • Building a Search UI with Hue
  • Accessing Impala through JDBC
  • Powering a Custom Web Application with Impala and Search
  • Cloudera Developer Training for Apache Hadoop or equivalent experience
  • Knowledge of Java and basic familiarity with Linux
  • Experience with SQL is helpful but not required

Request a Discounted Quote




Bring Training to You

Request schedule for this course

Request a Quote for this Class

We provide government and government contractor discounts, please request a quote

Schedule



total option: 0

Hotel and Travel can be included on your quote.
For immediate response, you can call 1-855-515-2170 or we will provide a quote within 4 business hours. Travel must be booked 14 days before training for rate to apply.

Learn How to Become a Managed Learning Member

Request a Quote

Thank you for requesting a quote, we will be in touch shortly with a quote. If you need immediate assistance, please call 855-515-2170.

Request Other Date

Request date or location you need

Don’t see the date or location you need? Contact us and let us know, we are adding dates and locations daily.