Sunteți pe pagina 1din 2

International School of Engineering

awards

Certificate of Completion
to

Shashank Mamidipelli
for the 352-hour program in

Big Data Analytics and Optimization


conducted between February 16, 2015 and July 29, 2015.

This program is certified for quality of content, assessment and pedagogy by the Language Technologies Institute (LTI)
of Carnegie Mellon University (CMU). LTI also provided assistance in curriculum development for this program.

Dated this eighteenth day of June, two thousand and nineteen.

Dr. Dakshinamurthy V Kolluru Dr. Sridhar Pappu


President Executive VP - Academics

01CSE03/201508/497 Program details are on the back


Mode: Classroom Teaching Certificate Type
Topics Covered Certificate of Participation Assessment-based training program Professional certification

Planning and Thinking Skills for Architecting Data Science Solutions Text Mining, Social Network Analysis and Natural Language Processing
Why build models or use data to run a business? What kind of models are built? Were do models not work? How do Introduction to text mining and text pre-processing: Write a web crawler to collect data, R, Find unique words and
you make predictions? When does big unstructured data become important? counts, Handling number, Punctuations, Stop words, Incorrect spellings, Stemming, Lemmatization and TxD
Thinking tools: Approximations and estimations, Geometric visualization of data and models computation
Choosing the right models and architecting a solution: Structure and anatomy of models, Problematic data and Unstructured vs. semi-structured data; Fundamentals of information retrieval
choosing the right experimentation Properties of words; Vector space models; Creating Term-Document (TxD) matrices; Similarity measures
Sources of errors in predictive models and techniques to minimize them Low-level processes (Sentence Splitting; Tokenization; Part-of-Speech Tagging; Stemming; Chunking)
Interacting with technical and business teams; Case study Text classification and feature selection: How to use Naïve Bayes classifier for text classification
Evaluation systems on the accuracy of text mining
Essential Engineering Skills in Big Data Analytics Sentiment Analysis
Reading from Excel, CSV and other forms; Data exploration (histograms, bar charts, box plots, line graphs and scatter Natural Language Analysis
graphs); Storytelling with data: The science, ggplot, bubble charts with multiple dimensions, gauge charts, tree maps, Discussion of text mining tools and applications
heat maps and motion charts
Data pre-processing of structured data: R, Handling missing values, Binning, Standardization, Outliers/Noise, PCA, Type Methods and Algorithms in Machine Learning — Unsupervised and Supervised
conversion Rule based knowledge: Logic of rules, Evaluating rules, Rule induction and Association rules
Construction of Decision Trees through simplified examples; Choosing the "best" attribute at each
Fundamentals of Probability and Statistical Methods non-leaf node; Entropy; Information Gain; Generalizing Decision Trees; Information Content and Gain Ratio; Dealing
Probabilistic analysis of data and models, Analyzing networks and graphs: Analyzing transitions, Markov chains and with numerical variables; Other measures of randomness; Pruning a Decision Tree; Cost as a consideration; Unwrapping
unstructured data Trees as rules
Computing the properties of an attribute: Central tendencies (Mean, Median, Mode, Range, Variance, Standard Specialized decision trees (oblique trees)
Deviation); Expectations of a Variable; Describing an attribute: Probability distributions (Discrete and Continuous) - Ensemble and Hybrid models
Bernoulli, Geometric, Binomial, Poisson and Exponential distributions; Special emphasis on Normal distribution; Central AdaBoost, Random Forests and Gradient boosting machines
Limit Theorem; t-distribution K-Nearest Neighbor method; Wilson editing and triangulations; K-nearest neighbors in collaborative filtering, digit
Describing the relationship between attributes: Covariance; Correlation; ChiSquare recognition
Inferential statistics: How to learn about the population from a sample and vice-versa, Sampling distributions, Motivation for Neural Networks and its applications; Perceptron and Single Layer Neural Network, and hand
Confidence Intervals, Hypothesis Testing; ANOVA calculations; Learning in a Neural Net: Back propagation and conjugant gradient techniques; Application of Neural Net
Statistics and Probability in Decision Modeling in Face and Digit Recognition
Regression (Linear, Multivariate Regression) in forecasting; Analyzing and interpreting regression results; Deep Learning techniques
Logistic Regression for classification Connectivity models (hierarchical clustering); Centroid models (K-Means algorithm); Distribution models (Expectation
Trend analysis and Time Series; Cyclical and Seasonal analysis; Box-Jenkins method; Smoothing; Moving averages; Auto maximization); Spectral clustering
-correlation; ARIMA – Holt-Winters method Linear learning machines and Kernel methods in learning
Bayesian analysis and Naïve Bayes classifier; Bayesian Belief Networks VC (Vapnik-Chervonenkis) dimension; Shattering power of models
Algorithm of Support Vector Machines (SVM)
Optimization and Decision Analysis
Genetic algorithms: The algorithm and the process, Representing data, Why and how do they work? Communication, Ethical and IP Challenges for Analytics Professionals
Linear Programming: Graphical analysis; Sensitivity and Duality analyses Why is Communication important?
Integer and Binary programming: Applications, Problem formulation, Solving in R How to communicate effectively: Telling stories
Goal programming; Data development analysis Communications issues from daily life using examples using audio, video, blogs, charts, email, etc.
Quadratic programming Seeing the big picture; Paying attention to details; Seeing things from multiple perspectives
Challenges: Mix of stakeholders, Explicability of results, Visualization
Engineering Big Data with R and Hadoop Ecosystem Guiding Principles: Clarity, Transparency, Integrity, Humility
Introduction—Big Data, Hadoop applications; Parallel and Distributed computing; Introduction to algorithms; Framework for Effective Presentations; Examples of bad and good presentations
Concurrent algorithms; Linux refresher; NoSQL; GFS; HDFS; CDH4-HDFS Writing effective technical reports
Map Reduce: YARN Difference between Legal and Ethical issues
Map Reduce Applications: Text Mining, Page Rank, Graph processing Challenges in current laws, regulations and fair information practices: Data protection, Intellectual property rights,
Hadoop ecosystem components: Pig, Hive, HBase, Sqoop, Mahout, Spark, H2O, Hama, Flume, Chukwa, Avro, Whirr, Confidentiality, Contractual liability, Competition law, Licensing of Open Source software and Open Data
Hue, Oozie, Zookeeper How to handle legal, ethical and IP issues at an organization and an individual level
R-Hadoop The “Ethics Check” questions

S-ar putea să vă placă și