• Call Now
    040 6666 6158
  • What's App
    +91 9642115000
  • Email

Big Data, Data Science Training – Combo Course

Big Data, Data Science Training – Combo Course

  • Big data Hadoop developer is responsible for coding or programming of Hadoop applications. The role is similar to that of a software developer, but in a big data domain. The big data Hadoop development course prepares the student to:

    • Hadoop development and implementation
    • Pre-processing using Hive and Pig
    • Translate complex, functional and technical requirements into detailed design
    • Analyze vast store of data stores to uncover insights
    • Maintain security and data privacy
    • Build scalable, high performance web services for data tracking
    • Do high speed querying

    After the completion of the course one can find jobs in:

    • Travel
    • Retail
    • Finance
    • Healthcare
    • Manufacturing
    • Life sciences
    • Telecom
    • Media & entertainment
    • Transportation & logistics
    • IT
    • Consulting
    • Government
    • Introduction and Motivation of Hadoop
      • What is Big Data?
      • Challenges in Big Data.
      • Challenges in Traditional Application.
      • New Requirements.
      • What is Hadoop.
      • Brief history of Hadoop.
      • Features of Hadoop.
      • Hadoop v/s RDBMS.
      • Hadoop Ecosystem’s overview.
      • Overview of HDFS and MapReduce.
    • Hadoop Distributed File System
      • Understanding Configuration and
      • HDFS Concepts
        • Blocks and Splits
        • Input Splits
        • HDFS Splits
        • Data Replication
        • Hadoop Rack Awareness
        • Version File
        • Safe mode
        • Namespace IDs
        • Reading and Writing in HDFS
      • Data high availability
      • Data Integrity
    • Hadoop demons
      • Master Daemons
        • Name node
        • Job Tracker
        • Secondary name node
      • Slave Daemons
        • Job tracker
        • Task tracker
    • Cluster architecture and block placement
    • Accessing HDFS using APIs
      • CLI Approach
      • Hands On Exercise
    • HDFS Shell Commands
      • Hands On Exercise
    • Setting Up Hadoop Cluster For Apache Hadoop
      • Downloading Hadoop
      • Installing ssh
      • Configuring Hadoop
      • Download ,Installation & Configuration Hive
      • Download ,Installation & Configuration Pig
      • Download ,Installation & Configuration sqoop
      • Download ,Installation & Configuration Hive
      • Installing MySql in hadoop cluster.
      • Download and work with Cloudera Immage.
      • Configuring Hadoop in Different Modes
      • Local Mode , Pseudo-distributed Mode and Fully distributed mode
      • Running daemons on dedicated nodes
    • Daily Administrative Tasks
      • Managing Hadoop Processes
      • Starting and Stopping Processes with Init Scripts
      • Starting and Stopping Processes Manually
      • HDFS Maintenance Tasks
      • Adding a Datanode
      • Decommissioning a Datanode
      • Checking Filesystem Integrity with fsck
      • Balancing HDFS Block Data
      • Dealing with a Failed Disk
      • MapReduce Maintenance Tasks
      • Adding a Tasktracker
      • Decommissioning a Tasktracker
      • Killing a MapReduce Job
      • Killing a MapReduce Task
      • Dealing with a Blacklisted Tasktracker
    • MapReduce (Programming)
      • Developing MapReduce Programs in Local Mode
      • Developing MapReduce Programs in Pseudo-distributed Mode
      • Developing MapReduce Programs in Fully distributed mode
      • MapReduce architecture
      • MapReduce Programming Model
      • Re visit block and input splits
      • Common Input and Output Formats
      • MapReduce Data types
      • Writing MapRduce Program

    Driver Code, Mapper Code and Reducer Code

    Hadoop's Streaming API

    • Joining Data Sets

    MapJoins and Reduce Joins

    • Data Flow in MapReduce Application
    • Understanding Tool Runner
    • MapReduce Streaming and Pipeling
    • Data localization in Map Reduce
    • Hands on Exercise
    • Using Combiner
    • Using Distributed Cache
    • Secondary Sorting Using Map Reduce
    • Passing the parameters to Mapper and Reducer
    • Hands On Exercise
    • Writing Custom Data types
    • Writing Custom Partitioner
    • Hands On Exercise
    • Debugging MapReduce Jobs in varies Modes
    • Unit Testing MR Jobs with MRUnit
    • Logging and Other Debugging Strategies
    • Exploring well known problems using MapReduce applications
    • Counters
    • Skipping Bad Records
    • Rerunning failed tasks with Isolation Runner
    • Performance Tuning in MapReduce
    • Reducing network traffic with combiner
    • Partitioners
    • Using Compression
    • Reusing the JVM
    • Running with speculative execution
    • Performance Aspects
    • Framework sort controlling Techniques


    • Hive concepts
    • Hive architecture
    • Install and configure hive on cluster
    • Different type of tables in hive
    • Hive library functions
    • Buckets
    • Partitions
    • Joins in hive
    • Inner joins, Outer Joins
    • Hive UDF
    • Hive UDAF
    • Hive UDTF


    • Pig basics
    • Install and configure PIG on a cluster
    • PIG Library functions
    • Pig Vs Hive
    • Write sample Pig Latin scripts
    • Modes of running PIG

    Running in Grunt shell

    • Running as Java program
    • PIG UDFs


    • HBase concepts
    • HBase architecture
    • Region server architecture
    • File storage architecture
    • HBase basics
    • Columnar Familys
    • Accessing Hbase command from JRuby Shell.
    • Get
    • Scans
    • HBase use cases
    • Install and configure HBase on a multi node cluster
    • Create database, Develop and run sample applications
    • Access data stored in HBase using clients like Java, Python and Pearl
    • Map Reduce client to access the HBase data


    • Introduction to Sqoop.
    • MySQL client and Server Installation
    • Sqoop Installation.
    • How to connect to Relational Database using Sqoop
    • Different Sqoop Commands
    • Different flavors of Imports
    • Export
    • Hive-Imports

    Understanding of FLUME and OOZIE

    Over View of Live Project

    Over View of Real Time Implementation

The free JavaScript formatter will handle your dirty JS codes.

Sample Description


This course is designed for clearing CCA Spark and Hadoop Developer , Cloudera Certified Administrator for Apache Hadoop (CCAH) , R certification exam , Mahout Certification Exam ,Cloudera certification (CCP:DS) , Apache Strom training , Apache hbase certification exam CCB-400 , Apache Cassandra Professional  ,  Apache Spark Certification examination.  At the end of the course there will be a quiz and project assignments once you complete them you will be awarded with Intellipaat Course Completion certificate. Become in demand with Intellipaat certifications

Enquiry Form