Sunteți pe pagina 1din 12

Case Studies in Big Data

Joshua Cook
2
Chapter 1

Programming
Fundamentals

The Zen of Python

7 fundamental aspects of programming


1. Language - R, Python, Julia
2. Data types
• Basic data types
• Compound data types
3. Operators
4. Functions
5. loops
6. conditionals
7. libraries
8. objects
if __name__ == "__main__":

3
4 CHAPTER 1. PROGRAMMING FUNDAMENTALS
Chapter 2

APIs

the Twitter Streamer


1. docker build and run, Dockerfile
2. twitter credentials
3. main.py, collect_tweets
4. TwitterStreamer, __init__
5. TwitterStreamer, create_twitterator
6. TwitterStreamer, get_next_tweet
7. TwitterStreamer, insert_into_mongo

5
6 CHAPTER 2. APIS
Chapter 3

MongoDB

Connect to the data


databases
collections

7
8 CHAPTER 3. MONGODB
Chapter 4

Aggegration

1. single purpose aggregation


2. aggregation pipeline
3. map-reduce function

Process Users

Intro to ETL

The DAG

1. mongo job queue


2. aggregate functions

• $project
• $match
• $sample

3. seven DAGs
• count by day
• plot points with folium
• parse text topics
• counts by dow-hour
• unique users
• check to see if all users have been processed
• find duplicate users

9
10 CHAPTER 4. AGGEGRATION

Distributed NLP on Twitter


1. twitter streamer
2. connecting to mongo
Part # Advanced Applications of Unsupervised Learning
1. Advanced Component Analysis
1. Facebook PCA
2. Corresponded Analysis
3. Independent Component Analysis
4. Network Component Analysis
5. Kernel PCA
6. Multilinear PCA
2. Natural Language Processing
1. Bag of Words
2. TFIDF
3. Latent Semantic Analysis
4. Latent Dirichlet Allocation
5. Topic Modeling
3. Market Basket Analysis
1. Affinity Analysis
2. Association Rule Learning
3. Rule-Based Machine Learning
4. Recommender Systems
1. Collaborative Filtering
2. Content-based Filtering
Part # Predictive Learning
5. Generalized Low Rank Models
1. In Python
2. Use Cases
6. Unsupervised Deep Learning
1. Autoencoders
2. Word2Vec
3. Skip-Gram
7. Generative Models
1. Generative Adversarial Networks
2. Restricted Boltzmann Machine
3. Magenta (music generation with tensorflow)
11
12 CHAPTER 5. ADVANCED CASE STUDIES

Chapter 5

Advanced Case Studies

Framework

Data Preparation

Implementation

Refinement

Model Evaluation

Model Justification

Presentation of Results

A Regularized Linear Model with an Augmented


Data Set on the Ames Iowa Housing Data

Feature Extraction Pipelines on the Madelon


Dataset

Introduction to Image Recognition with MNIST

Financial Modeling on AAPL

Semantic Search over Wikipedia Text

Introduction to Reinforcement Learning with

S-ar putea să vă placă și