Documente Academic
Documente Profesional
Documente Cultură
About Edureka
Edureka is a leading e-learning platform providing live instructor-led interactive online training. We cater to
professionals and students across the globe in categories like Big Data & Hadoop, Business Intelligence & Analytics,
NoSQL Databases, Java & Mobile Technologies, System Engineering, Project Management and Programming.
We have an easy and affordable learning solution that is accessible to millions of learners. With our students spread
across 185 countries like the US, India, UK, Canada, Singapore, Australia, Middle East, Brazil and many others,
we have built a community of over 1 million learners across the globe.
Edureka's Python Certification Training not only focuses on fundamentals of Python, Statistics, Machine Learning
and Spark but also helps one gain expertise on applied Data Science at scale using Python. The training is a step by
step guide to Python and Data Science with extensive hands on.
The course is packed with several activity problems, quiz and assignments and scenarios that help you gain practical
experience in addressing an automation problem that would either require only Python or Machine Learning using
Python. Starting from basics of Statistics such as mean, median and mode to exploring features such as Data Analysis,
Regression, Classification, Clustering, Naive Bayes, Cross Validation, Label Encoding, Random Forests, Decision Trees
and Support Vector Machines with a supporting example and exercise help you get into the weeds.
Python course will also cover both basic and advanced concepts of Python like writing Python scripts, sequence and
file operations in Python. You will use libraries like Pandas, Numpy, Matplotlib, Scipy, Scikit, Pyspark and master
the concepts like Python machine learning, scripts, sequence, web scraping and big data analytics leveraging Apache
Spark.
1
Why Learn Python:
Python has been a favorite option for data scientists who use it for building and using Machine learning applications and
other scientific computations. Python cuts development time in half with its simple to read syntax and easy compilation
feature. Debugging programs is a breeze in Python with its built in debugger.
It runs on Windows, Linux/Unix and Mac OS and has been ported to Java and .NET virtual machines. Python is free to
use, even for the commercial products, because of its OSI-approved open source license.
It has evolved as the most preferred Language for Data Analytics and the increasing search trends on Python also
indicates that it is the Next Big Thing " and a must for Professionals in the Data Analytics domain.
Objectives:
During this Python online course, our expert instructors will help you:
2
Module 1 Module 2
Goal Goal
Give brief idea of what Python is and touch on basics. Learn different types of sequence structures, related
operations and their usage. Also learn diverse ways of
Objectives opening, reading, and writing to files.
3
Module 3 Module 4
Deep Dive Functions, Sorting, Errors and Exception, Object oriented programming in Python
Regular Expressions and Packages
Goal
Goal Understand the Object-Oriented Programming world in
Learn how to create generic python scripts, how to address Python and use of standard libraries.
errors/exceptions in code and finally how to extract/filter
content using regex. Objectives
Hands On Hands On
Functions - syntax, arguments, keyword arguments, Regular expressions - regex library, search/match
return values object, findall, sub, compile
Lambda - features, syntax, options, comparison with Classes - classes and objects, access modifiers,
functions instance and class members
Sorting - sequences, dictionaries, limitations of OOPS paradigm - Inheritance, Polymorphism and
sorting Encapsulation in Python
Errors and exceptions - types of issues, remediation
Packages and module - modules, import options, sys
path
4
Module 5 Module 6
Goal Goal
Learn how to debug, how to use databases and how a project Get familiar with basics of statistics, different types of
skeleton looks like in Python. measures and probability distributions, and the supporting
libraries in Python that assist in these operations.
Objectives
Objectives
Debug python scripts using pdb
Debug python scripts using IDE Statistics - data terminology, measurement scales,
Classify Errors types of data
Develop Unit Tests Libraries - IPython, Matplotlib
Create project Skeletons Measures, Moments, Variance, Std. Deviation using
Implement Database using SQLite Numpy
Perform CRUD operations on SQLite database Distributions, Probability and Bayes Theorem using
Scipy
Topics Numpy - arrays, matrices, related operations
Scipy - overview, areas of application
Debugging
Dealing with errors Topics
Using unit tests
Project Skeleton Data terminology
Required packages Scales of measurement
Creating the Skeleton Types of data
Project Directory IPython notebook installation
Final Directory Structure Numerical measure
Testing your set up Matplotlib introduction
Using the skeleton Deviation and variance
Creating a database with SQLite 3 Standard deviation
CRUD operations Covariance and correlation
Creating a database object. Conditional probability
Bayes theorem
Hands On Distribution/Probability functions
Installing Numpy
Numpy arrays and matrices
Debugging - debugging options, logging,
Installing Scipy
troubleshooting
Scipy Modules and stats
Unit testing - TDD, unit test library, assertions,
automated testing Hands On
Project skeleton - industry standard, configurations,
Statistics - scales of measurement, numerical
sharable libraries
measures, variance, standard deviation, covariance
RDBMS - Python for RDBMS, PEP 49, CRUD and correlation, probability, Bayes theorem and
operations on Sqlite distribution functions
Numpy - arrays, matrices and types of operations
Scipy - stats modules, physical constants, skewness,
kurtosis
5
Module 7 Module 8
Machine Learning using Python Essentials Data Analysis and Machine Learning Deep Dive
Goal Goal
Learn in detail about Supervised and Unsupervised learning Tackle complex machine learning problems requiring
and examples for each category. classification or clustering.
Objectives Objectives
Define Machine Learning and understand Supervised At the end of this Module, you should be able to:
vs Unsupervised
Feature engineer datasets using PCA,
Apply Supervised Learning process flow, regression
Bias/Variance analysis
analysis
Apply classifications algorithms like KNN, Random
Apply Unsupervised Learning process flow, clustering
Forests, SVM etc.
Apply Linear Regression, Multivariate Regression
Apply clustering algorithms like K-Means,
Measure accuracy using Mean Squared Error, Cross
Hierarchical clustering etc.
Validation
Compute classification and clustering metrics to
Analyze data using Pandas
ascertain model accuracy
Topics Topics
Pandas - Series, Data Frames, data analysis involving Data analysis activity using live datasets from
grouping, sorting, filtering, munging, Google Finance
visualization/plotting and mesh up Encoders, vectorizers, PCA, KNN, CART, Random
Forest Ensemble, SVM, Clustering, Accuracy
measures using Metrics 6
Module 9 Module 10
Scalable Machine learning using Spark Web Scraping in Python and Project Work
Goal Goal
Learn Spark basics and run machine learning models over Discuss about the powerful web scraping using Python and
Spark discuss a real-world project.
Objectives Objectives
At the end of this Module, you should be able to discuss: Discuss web scraping and its advantages
Apache Spark - Concepts, RDD, MLLib, Data frames Discuss Steps Involved in Web Scraping
Use BeautifulSoup package and its functions
Transformations, Actions, Shuffling, Persistence and
Data Removal Scrape IMDB webpage
Fetch Streaming Tweets from Twitter
Shared variables - accumulators and broadcast
Perform Sentiment Analysis on tweets Fetched
Spark SQL and Data frames
from Twitter and determine which is more popular
Spark MLlib
Ferrari or Porsche
Regression, Classification & Clustering with PySpark
Topics
Topics
Web scraping
Apache Spark introduction
Introduction to Beautiful soup package
Spark engine
How to scrape webpages
Spark core API
A Real-world project showing scrapping data from
Spark libraries
Google finance and IMDB.
SparkContext and SparkConf
Concepts - RDD, Shuffling and Persistence
Hands On
RDD transformations and actions
Shared variables - Accumulators, Broadcasts
Scraping - BeautifulSoup and its functions, pulling
Spark SQL and Dataframes
content using regex, restricting content using
Spark MLlib
SoupStrainer
Regression with PySpark
Scraping IMDB, Reddit
Classification with PySpark
Tweet sentiment analysis using Twitter API for
Clustering with PySpark
Python
Hands On
7
Real Time Case Study:
Challenge
AB Consultants is a company that outsources its employees as Consultants to top various IT firms. Their business had been increasing
quite well over past, however in recent times there has been a slowdown in terms of growth because their best and most experienced
employees have started leaving the Company. In order to prevent this proactively you first need to dive in to the Companys Employee
Data and find out an answer as to know why the best and most experienced employees are leaving.
Solution
As a Data Analyst of the Company you are required do an analysis and find out patterns as to why the best employees are leaving so
early.
Using Python, you derive at a forecast model to predict which employees could be leaving the company, as well as a probability as to
why our best and most experienced employees are leaving prematurely. This will help to plan our next steps to avoid the churn out.
You decide to create a script that will contain the following:
Hardware requirements:
The system requirement for Python course is a system with Intel i3 processor or above, minimum 3GB RAM (4GB recommended) and
an operating system can be of 32bit or 64 bit
Practicals can be executed using virtual machine given by Edureka with python and PyCharm community edition installed on it. Using
PyCharm Community Edition, both the Spark and Python practicals can be executed.
The detailed step-wise installation guides are present in LMS which will help them to install and set-up the environment for Python.
8
I took courses on Big data and Python and found them very useful. Some of the
instructors focused on getting the core concepts right which .Customer service
is excellent and very responsive.
Uma Uppin
Data Engineer, Facebook
I am very much impressed with the quality of the training material and the
Trainer as well. You get access to previously recorded sessions just after your
signup. Also, they are very quick in response for any queries you have. Online
recording of the course is very useful as you can go back and refer to it any time.
Sravan Kumar
Big Data QA Engineer, Nike Inc
Bhairav Mehta,
Senior Program Manager, Apple