DATA SCIENCE COURSE

BIG DATA HADOOP COURSE 4.5/5
1388 reviews
Business Analytics Course

DATA SCIENCE TRAINING

This Big Data Analytics has been designed to help meet the expanding needs for these "Data Scientists" who are skilled in the utilization of a unique blend of Science, Art and Business.

Data Science Training

ABOUT DATA SCIENCE PROGRAM

Many Corporations have dramatically increased investments in their "digital enterprises" in the past few years. It has been estimated that by 2020, IT departments will be monitoring 50 times more data than they are today. This tidal wave of data is driving unprecedented demand for those with the skills required to manage and leverage these very large data sets into a competitive advantage.

These professionals are skilled in automating methods of collecting and analyzing data and utilizing inquisitive exploring techniques to discover previously hidden insight from this data that can profoundly impact the success of any business.

WHY DATA SCIENCE AT EDUPRISTINE?

Academically Rigorous: EduPristine is known for classroom training and project-based learning. This big data analytics training is no exception. The program is designed and delivered by the experienced faculty and data science professionals who teach at the EduPristine campus

Live Classes: Unlike a typical classroom learning experience, these classes are delivered using standard practices. EduPristine blends live, face-to-face classes with top-notch platform facilities.

ELIGIBILITY

Individuals with a bachelor's degree in engineering, science, maths/statistics, finance, computer science, accounting or marketing who are intrigued by statistical and analytical practices may excel in this field

  • Basic Statistics methods used in business performance measures
  • Strong interest in data science
  • Hands-on experience on Core Java & Unix
  • Good analytical skills to grasp and apply the concepts in Hadoop

COURSE CURRICULUM

Readings

Pratical Implementation

1. Introduction to Hadoop/ Spark

2. Good Data Scientist tool kit

3. How modern Big Data technologies & tools provides answers to below problems:
  • Volume is large - Batch Analytics
  • Velocity is High - Real Time Analytics
  • Variety in Data - Unstructured or Semi Structured data
  • Any non-functional parameters like cost, Reliability, fault tolerance
Getting started with the fundamentals of hadoop/spark
and setting a base to align the same with batch & real-time analytics

Readings

Pratical Implementation

1. Getting started with fundametals of programming

2. Python for data processing 

3. Unix for CLI Commands - Getting familiar with Unix and CLI is first priority

4. Map Reduce concept and understanding

5. SQL for Hive
Getting started with fundametals of programming: Python for data processing & unix for CLI Commands

Readings

Pratical Implementation

1. Cluster Specification & Hadoop Configuration

2. Basic Linux and HDFS commands

3. Command Line Interface

4. Hadoop File Systems

5. Data Flow

6. Become familiar with cloud environment

7. Set-up development environment
Introduction to big data storage, structured data ingestion:
touching base on parellel programming on scalable machines

Readings

Pratical Implementation

1. Parallel programming on scalable Machines: Map Reduce

2. Mastering Key Value Pairs:Case Study

3. Distributed Computing using Map Reduce

4. The Execution Framework, Concept of Practitioners
Understanding - Map-Reduce Basics and Map-Reduce Types and Formats

Readings

Pratical Implementation

1. Importing Large Objects, Performing Exports, Exports - A Deeper Look. Introduction to Database Imports, Working with Imported Data

Readings

Pratical Implementation

1.Data warehousing, Management and querying on hadoop:Hive

2. Web Interface for analyzing data: Hadoop User Experience (HUE)

3. Querying Data

4. User Defined Functions

5. Custom Map/Reduce in Hive
Getting started with data warehousing, management and querying
on hadoop: HIVE & web interface for analyzing data: Hadoop User Experience (HUE)

Readings

Pratical Implementation

1. Data Flow ETL Scripting Language : Pig

2. Installing and Running Pig, Grunt

3. Pig's Data Model, Pig Latin

4. Developing & Testing Pig Latin Scripts

5. Writing Evaluation
Building the fundamentals for data warehousing, management and querying
on hadoop: HIVE & web interface for analyzing data: Hadoop User Experience (HUE)

Readings

Pratical Implementation

1.Recap of Hadoop

2. Opportunity for In memory computing

3. Spark Ecosystem

4. Time comparisons with Map Reduce

5. Spark Architecture

6. Spark Context

7. Resilient Distributed Dataset
How to Run programs up to 100x faster than Hadoop MapReduce in
memory, or 10x faster on disk using Spark

Readings

Pratical Implementation

1. Lighting Fast In Memory Cluster Computing:Spark

2. Batch Processing Historical data: Log Analysis; Ecommerce Industry

3. Interactive and Batch Mode production

4. Transformations

5. Spark Program Life Cycle

6. Closures, Accumulators and Broadcast variables

7. Project: Log Analysis (Batch processing with Spark)
Getting to understand the log analysis, involving SPARK and
Python with the help of a business case study to get a hands on experience

Readings

Pratical Implementation

1.Work Flow Management Tool: Oozie

2. Oozie Workflow, Actions and Control flow

3. Sqoop Action

4. Hive Action

5. HDFS Action
Introduction of the work flow management tool with hands on examples

Readings

Pratical Implementation

1. Random Read and Write Access, OLAP, NoSQL

2. Database: Hbase

3. Client API - Advanced Features

4. Client API - Administrative Features

5. Available Client, Architecture

6. Map Reduce Integration, Advanced Usage

7. Advance Indexing
Introduction to the fundamentals of random read and write Access, OLAP, NoSQL database: Hbase

Readings

Pratical Implementation

1.MIS Reporing and ELT on Hadoop: Retail Domain

2. We will provided data sets on which participants will work as a part of the Project:
  • Load data into MySQL
  • Retail Data Analysis with Pig
  • SQOOP data into HDFS
  • Retail Data Analysis with Hive
Given a retail business scenario, this provides a run-through of the MIS reporting and ELT on Hadoop

Readings

Pratical Implementation

1.Customer 360 & Genome: Banking sector

2. We will provided data sets on which participants will work as a part of the Project:
  • Data Creation
  • MySQL Data Ingestion
  • Sqoop Daily data to MySQL
  • HBase table Creation
Given a banking sector scenario, this provides a run-through on Customer 360 & Genome

Readings

Pratical Implementation

1.Using Flume, Kafaka , Spark Streaming and Batch Processing Using Hive & Impala A run-through on structured data ingestions, semi structured processing

Readings

Pratical Implementation

A session on the exam preparation, pattern and the important topics to be discussed

Readings

Pratical Implementation

Online - 2 hours

Readings

Pratical Implementation

1.Population vs. Sample

2. Types of Data Variables and Summarizing

3. Central Tendency and Spread/Variability

4.  Data Collection and Data Dictionary

5. Probability and Random Variables

6. Probability Distribution: Discrete and Continuous Distributions

7. Central Limit Theorem

8. Hypothesis Testing
Introduction to data and statistics with an insight on concepts of distribution and hypothesis testing

Readings

Pratical Implementation

1. Correlation and Regression.

2. Multivariate Linear Regression Theory

3. Bivariate Analysis

4. ANOVA (Analysis of Variance.)

5. Identify and Quantify the factors responsible for loss amount for an Auto Insurance Company
  • Model Misspecifications
  • Economic meaning of a Regression Model
  • Bivariate Analysis
Given a multivariate linear regression case study, understanding the correlation and regression, ANOVA.

Readings

Pratical Implementation

1.Identifying problems in fitting linear regression on data having "Binary Response" variable

2.Generalized Linear Modeling (GLMs)

3.Logistic Regression Theory/Case
  • Fitting the regression using SAS language
  • Lift/Gains chart and Gini coefficient
  • K-S stat
4.Identify bank customers who will most likely default in making the payment on balance due.
Given a multivariate logistic regression case study, identifying
problems in fitting linear regression on data having "Binary Response" variable Generalized Linear Modeling (GLMs)

Readings

Pratical Implementation

1. Models of time series

2. The Box-Jenkins model building process Identify the ARIMA model.
  • Forecasting future sales based on historical data for an automobile company
3. Identify bank customers who will most likely default in making the payment on balance due.
Models of time series, The Box-Jenkins model building process - ARIMA Modeling

Readings

Pratical Implementation

1. Affinity analysis to understand purchase behavior

2. Understanding Apriority algorithm

3. Analysis of observational datasets to find unforeseen relationships.

4. Analysis of output results to plan store layout, promotions and recommendations
Optimization of expenses using Market mix modeling

Readings

Pratical Implementation

1.Getting an insight on Data Mining and Decision Trees

2. Introduction and practical application of CHAID analysis

3. Introduction and practical application of CART

4. Understand the usage of clustering

5. Getting an insight on various Clustering methods

6. Hands on for K-means Clustering Algorithm
Understanding the CHAID & CART Analysis and linking the same with K-means clustering

Readings

Pratical Implementation

1. Develop a scoring algotithm

2. Email Score
  • Open rate
  • Click rate
  • Unsubscribe rate
3. Rank campaign accordingt to Email Score
Air Traffic Control For Emails

Readings

Pratical Implementation

Cross Sell Model Propensity to Cross sell health insurance products to general insurance customers.
Market Mix Modeling Optimization of the promotion expense using Market mix modeling
Churn Analytics Developing a churn model to gauge the propensity of attrition among loyal and profitable customer segment.
Email Optimization- Ecommerce Industry Developing a system that ensures that the correct campaign reaches the relevant customers with a suitable frequency to further enhance the level of engagement across all email campaigns.
Customer Lifetime Value Analysis Predicting the customer survival along with the profitability to model the life time value of each customer.
Telecom Model to Estimate Bill Building a model that can suggest right tariff plan based on estimated bill amount.
Sentiment Analysis Process of detecting the contextual polarity of text to find whether a piece of writing is positive, negative or neutral.

Readings

Pratical Implementation

1.The visualization design methodology.

2. The Data Visualization Process. 

3. Working with Single Data Sources.

4. Using Multiple Data Source

5. Using Calculations in Tableau
An introduction to various data visualisation techinques and later tying them back to varios scenarios

Readings

Pratical Implementation

1.Solving problem statements using apache hive and R Real Time Analytics, Unstructured Data Ingestion

Readings

Pratical Implementation

1.Ridge Regression
  • Cost functions
  • Ridge regression equation
  • Application of Ridge regression
2.LASSO Regression
  • Cost functions
  • Lasso regression equation
  • Application of Lasso regression
Introduction to machine learning with an hand on for Ridge and LASSO regression

Readings

Pratical Implementation

1.Count Regression
  • Poisson Regression
  • Negative Binomial Regression
  • Zero Inated Regressions
2.Survivor Analytics
  • Time-to-event Data
  • Survival Analysis
  • Comparing Survival Curves
3.Goodness of fit

4.Model Comparisons
Getting an understading on count regression and survivor analytics

Readings

Pratical Implementation

1.Random Forest
  • Hyper parameters of Random Forest
  • Fine Tuning Random Forest calculating its cost function
2.Neural Network, Back Propagation, Back Propagation Intuition, Gradient Checking
  • The Perceptron learning
  • The back propagation learning
  • Recurrent neural networks
  • Feed Forward Neural Network
Insight on deep machine learning through random forest and neural network

Readings

Pratical Implementation

1.Cloud and Platform as a Service like APIs from various cloud providers

2.APIs related to NPL - Natural Language Processing Suitable APIs

3.Development Environment Setup

4.Developing hands on solution

5.More such APIs in area of advanced Analytics and AI like object recognition, mood detection from Image etc in area of cognitive computing

Integrating advanced analytics using ML/ AI APIs with machine learning

COURSE HIGHLIGHTS

Extensive Classroom Training

Get trained by topic experts with interactive learning in small batches

Online Live Instructor base training

Learn Concepts once again though Live Online sessions.

Complimentary Course

1.Java Essentials for Hadoop, Python and UNIX session
2.Basic Stats video recordings.

Lab Practical

200 Hrs - Virtual Lab practice (SAS Language - valid till 1 year)
3 Months – Cloud lab access to work on Hadoop Platform.

Online Materials

This course serves as an introduction to the interdisciplinary and emerging field of data science. Students will learn to combine tools and techniques from statistics, computer science, data visualization and the social sciences to solve problems using data.

Assignments & cases

"Work on real time cases from different domains.

Home Assignment & Online Live Discussion Module

Work on home assignment & every Friday discuss It with Faculty on Live Online Mode.

24x7 Online Access

Access to Course Material (Unlocked Excel Models, Presentations, etc.)

Doubt Solving by Experts

A Write to us and get your doubts solved by our experts within 2 business days. You can also initiate a discussion by posting it on active forums

Online Content

Download the study notes to supplement subject wise video tutorials & webinar recordings.

Certificate

A reference to get ahead in your career. At the end of the course, you will receive a Certificate of Completion.
1.Business Analytics Module
2.Big data & Hadoop Module
3.Data Science Module

Real-world Case Studies

Get the best training in analytics by understanding real world problems and scenarios

Unlimited Download Access

Download the whole material anytime during your 1 year subscription and use it for any future reference

Data Science TRAINING VENUE AND DATES

Banglore
Batch Starts From: 27th Aug


Asha Plaza, Second Floor, #607, 80 Feet Pheripheral Road, Koramangala IV Block, Bangalore – 560095

Delhi
Batch Starts From:14th Sep


EduPristine,#44,2nd Floor Regal Complex Outer Circle Connaught Place

Pune
Batch Starts From:17th Sep


Semat Institute of Excellence, 2nd floor tandle heights above moolchand sweets, senapati bapat road Pune Landmark - Opp. WS Bakers

Mumbai
Batch Starts From: 30th Sep


7th Floor, 702, Raaj Chambers, Old Nagardas Road, Near Andheri Subway, Andheri East 400069

Data Science Training Highlights

Price

Rs 115000

5 months Classroom Training (Weekend Batch)

Pre-requisite Video


Business Analytics Module - Tutorial on Basic Statistic and Data, along with Intro to R Studio Software.
Big Data & Hadoop Module - Unix, Core Java & Python Language.

Online Live Instructor base training

Exam Preparation Session

Classroom & Home Assignments to get hand on experience Along with discussion -


Along with Classroom Assignments, get hand on experience on additional Home Assignment followed by Live Online discussion with faculty.

Certification –
i) Certificate of Completion / Excellence Certificate in Business analytics. ( Join Certificate from Edupristine / Dun & Brad Street)
ii) Certificate of Completion / Excellence Certificate in Big data & Hadoop
iii) Certificate of Completion / Excellence Certificate in data science

Various domain & multiple assignments for practice purpose

Downloadable course material

Webinar Video recording for each module

Forum to Discuss with Fellow Students and Experts

Hadoop Definitive Guide For reference.

Buy

FAQs

What is Data Science? ?

Data science is the study of the generalizable extraction of knowledge from data, yet the key word is science. It incorporates varying elements and builds on techniques and theories from many fields, including signal processing, mathematics, probability models, machine learning, statistical learning, computer programming, data engineering, pattern recognition and learning, visualization, uncertainty modeling, data warehousing, and high performance computing with the goal of extracting meaning from data and creating data products.

Data Science is not restricted to only big data, although the fact that data is scaling up makes big data an important aspect of data science.

TESTIMONIALS

We sincerely appreciate the flexibility of teaching and customized guidance that the institute provided each of us. The intensity of the programme prepares one for high pressure situations. We are very grateful for the very valuable training and assistance provided to us by EDUPRISTINE.


The beauty of this course is to have lots of practice. If one chooses to do it seriously, he is done with the subject. Trainers are also very good and are from a good industrial background. Whenever possible they try to relate this to real world examples. I am going to attend this again for topics that I missed out. I could actually solve my real work problem using the example demonstrated in the class. Thank you Edupristine.

SUBMIT A QUERY
Popular Courses
Disclaimer

GARP does not endorse, promote, review or warrant the accuracy of the products or services offered by EduPristine of GARP Exam related information, nor does it endorse any pass rates that may be claimed by the Exam Prep Provider. Further, GARP is not responsible for any fees or costs paid by the user to EduPristine nor is GARP responsible for any fees or costs of any person or entity providing any services to EduPristine. ERP®, FRM®, GARP® and Global Association of Risk Professionals™ are trademarks owned by the Global Association of Risk Professionals, Inc.

CFA® Institute does not endorse, promote, or warrant the accuracy or quality of the products or services offered by EduPristine. CFA® Institute, CFA® Program, CFA® Institute Investment Foundations and Chartered Financial Analyst® are trademarks owned by CFA® Institute.

Utmost care has been taken to ensure that there is no copyright violation or infringement in any of our content. Still, in case you feel that there is any copyright violation of any kind please send a mail to abuse@edupristine.com and we will rectify it.

dm_classroom_courses.php Post ID = 102392