Sunteți pe pagina 1din 80

Weapons of Math

Destruction
Cathy O’Neil
mathbabe.org

Monday, February 4, 13
For this talk...

• I’m focusing on predictive models

Monday, February 4, 13
But for this talk...

• I’m focusing on predictive models


• Is this math?

Monday, February 4, 13
But for this talk...

• I’m focusing on predictive models


• Is this math?
• What mathematicians in industry do

Monday, February 4, 13
But for this talk...

• I’m focusing on predictive models


• Is this math?
• What mathematicians in industry do
• Public face of math (besides calculus)

Monday, February 4, 13
Models are everywhere
• Google search
• Recommendations
• Credit scores
• Professional evaluations
• Ad displays
• Consumer offers and deals
Monday, February 4, 13
What is a model?

• Something that takes data in

Monday, February 4, 13
What is a model?

• Something that takes data in


• And a toy model of how things are related

Monday, February 4, 13
What is a model?

• Something that takes data in


• And a toy model of how things are related
• Gives out prediction

Monday, February 4, 13
What is a model?

• Something that takes data in


• And a toy model of how things are related
• Gives out prediction
• Should come with an evaluation method

Monday, February 4, 13
What is a model?

• Something that takes data in


• And a toy model of how things are related
• Gives out prediction
• Should come with an evaluation method
• Incredibly sensitive to manipulation

Monday, February 4, 13
Why should you care?

• Models are powerful

Monday, February 4, 13
Why should you care?

• Models are powerful


• But they are not oracles

Monday, February 4, 13
Why should you care?

• Models are powerful


• But they are not oracles
• They rely on trust people have of math

Monday, February 4, 13
Why should you care?

• Models are powerful


• But they are not oracles
• They rely on trust people have of math
• The authority of the inscrutable

Monday, February 4, 13
Why should you care?

• Models are powerful


• But they are not oracles
• They rely on trust people have of math
• The authority of the inscrutable
• The mathematician as super human

Monday, February 4, 13
Why you should care

• Conflict of interest

Monday, February 4, 13
Why you should care

• Conflict of interest
• Not a cultural norm

Monday, February 4, 13
Why you should care

• Conflict of interest
• Not a cultural norm
• Mathematicians are generally moral

Monday, February 4, 13
Why you should care

• Conflict of interest
• Not a cultural norm
• Mathematicians are generally moral
• We shouldn’t let this happen

Monday, February 4, 13
Salient properties
• Name

Monday, February 4, 13
Salient properties
• Name

• Underlying model

Monday, February 4, 13
Salient properties
• Name

• Underlying model

• Underlying assumptions

Monday, February 4, 13
Salient properties
• Name

• Underlying model

• Underlying assumptions

• Input/output

Monday, February 4, 13
Salient properties
• Name

• Underlying model

• Underlying assumptions

• Input/output

• Purported/political goal

Monday, February 4, 13
Salient properties
• Name

• Underlying model

• Underlying assumptions

• Input/output

• Purported/political goal

• Evaluation method

Monday, February 4, 13
Salient properties
• Name

• Underlying model

• Underlying assumptions

• Input/output

• Purported/political goal

• Evaluation method

• Gaming potential

Monday, February 4, 13
Salient properties
• Name

• Underlying model

• Underlying assumptions

• Input/output

• Purported/political goal

• Evaluation method

• Gaming potential

• Reach

Monday, February 4, 13
Ex:VAM
• Name:Value-Added Teacher model

Monday, February 4, 13
Ex:VAM
• Name:Value-Added Teacher model

• Underlying model: How much teacher raised scores vs. expectation

Monday, February 4, 13
Ex:VAM
• Name:Value-Added Teacher model

• Underlying model: How much teacher raised scores vs. expectation

• Underlying assumptions: Account for externalities, small errorbars

Monday, February 4, 13
1) Sources of errors in VAM (short list)

• Need to score test


• Some problems harder than others?
• Some years smarter than others?
• Some tests harder than others?
• Curve vs. normal or empirical distribution?

Monday, February 4, 13
2) Accounting for externalities in VAM

• Account for what is “under control”


• Tests better at testing middle than ends
• % of free school lunches very fat tailed
• Summer vacation loss
• “no child left behind” mindset
• Punishes teachers at tough schools
Monday, February 4, 13
Ex 2:VAM
• Name:Value-Added Teacher model

• Underlying model: How much teacher raised scores vs. expectation

• Underlying assumptions: Account for externalities, small errorbars

• Input/output: Student standardized test scores, attributes. Single #

Monday, February 4, 13
The underlying model

• Linear regression with multiple sub-models


• Errors introduced here too
• 14% correlation on NYC teachers YOY

Monday, February 4, 13
Ex:VAM
• Name:Value-Added Teacher model

• Underlying model: How much teacher raised scores vs. expectation

• Underlying assumptions: Account for externalities, small errorbars

• Input/output: Student standardized test scores, attributes. Single #

• Purported/political goal: Better teacher/ power, privatization

Monday, February 4, 13
Ex:VAM
• Name:Value-Added Teacher model

• Underlying model: How much teacher raised scores vs. expectation

• Underlying assumptions: Account for externalities, small errorbars

• Input/output: Student standardized test scores, attributes. Single #

• Purported/political goal: Better teacher/ power, privatization

• Evaluation method: None!


Monday, February 4, 13
Ex:VAM
• Name:Value-Added Teacher model

• Underlying model: How much teacher raised scores vs. expectation

• Underlying assumptions: Account for externalities, small errorbars

• Input/output: Student standardized test scores, attributes. Single #

• Purported/political goal: Better teacher/ power, privatization

• Evaluation method: None!

• Gaming potential: Cheating, etc. - mostly gamed by administrators

Monday, February 4, 13
Ex:VAM
• Name:Value-Added Teacher model

• Underlying model: How much teacher raised scores vs. expectation

• Underlying assumptions: Account for externalities, small errorbars

• Input/output: Student standardized test scores, attributes. Single #

• Purported/political goal: Better teacher/ power, privatization

• Evaluation method: None!

• Gaming potential: Cheating, etc. - mostly gamed by administrators

• Reach: LA, NY, Chicago public school systems...

Monday, February 4, 13
Ex 2a: Credit Scores
• Name: Credit Score

Monday, February 4, 13
Ex 2a: Credit Scores
• Name: Credit Score

• Underlying model: Unknown, but takes into account past paid bills etc.

Monday, February 4, 13
Ex 2a: Credit Scores
• Name: Credit Score

• Underlying model: Unknown, but takes into account past paid bills etc.

• Underlying assumptions: Behavior consistent over time

Monday, February 4, 13
Ex 2a: Credit Scores
• Name: Credit Score

• Underlying model: Unknown, but takes into account past paid bills etc.

• Underlying assumptions: Behavior consistent over time

• Input/output: Regulated, open to consumers for free 1x per year. Single #

Monday, February 4, 13
Ex 2a: Credit Scores
• Name: Credit Score

• Underlying model: Unknown, but takes into account past paid bills etc.

• Underlying assumptions: Behavior consistent over time

• Input/output: Regulated, open to consumers for free 1x per year. Single #

• Purported/political goal: Measurement/ buy-in, fear of default, quantification

Monday, February 4, 13
Ex 2a: Credit Scores
• Name: Credit Score

• Underlying model: Unknown, but takes into account past paid bills etc.

• Underlying assumptions: Behavior consistent over time

• Input/output: Regulated, open to consumers for free 1x per year. Single #

• Purported/political goal: Measurement/ buy-in, fear of default, quantification

• Evaluation method: Constant but not public. Example of death spiral.

Monday, February 4, 13
Ex 2a: Credit Scores
• Name: Credit Score

• Underlying model: Unknown, but takes into account past paid bills etc.

• Underlying assumptions: Behavior consistent over time

• Input/output: Regulated, open to consumers for free 1x per year. Single #

• Purported/political goal: Measurement/ buy-in, fear of default, quantification

• Evaluation method: Constant but not public. Example of death spiral.

• Gaming potential: Not high

Monday, February 4, 13
Ex 2a: Credit Scores
• Name: Credit Score

• Underlying model: Unknown, but takes into account past paid bills etc.

• Underlying assumptions: Behavior consistent over time

• Input/output: Regulated, open to consumers for free 1x per year. Single #

• Purported/political goal: Measurement/ buy-in, fear of default, quantification

• Evaluation method: Constant but not public. Example of death spiral.

• Gaming potential: Not high

• Reach: National, possibly international

Monday, February 4, 13
Aside: the death spiral of modeling

• Insurance: pooled risk

Monday, February 4, 13
Aside: the death spiral of modeling

• Insurance: pooled risk

• Add segmentation/ good health modeling

Monday, February 4, 13
Aside: the death spiral of modeling

• Insurance: pooled risk

• Add segmentation/ good health modeling

• Lose original goal

Monday, February 4, 13
Aside: the death spiral of modeling

• Insurance: pooled risk

• Add segmentation/ good health modeling

• Lose original goal

• People who benefit don’t see the problem directly

Monday, February 4, 13
Aside: the death spiral of modeling

• Insurance: pooled risk

• Add segmentation/ good health modeling

• Lose original goal

• People who benefit don’t see the problem directly

• Same can be said for credit card offers via credit scoring

Monday, February 4, 13
Aside: the death spiral of modeling

• Insurance: pooled risk

• Add segmentation/ good health modeling

• Lose original goal

• People who benefit don’t see the problem directly

• Same can be said for credit card offers via credit scoring

• In general if someone benefits someone loses

Monday, February 4, 13
Aside: the death spiral of modeling

• Insurance: pooled risk

• Add segmentation/ good health modeling

• Lose original goal

• People who benefit don’t see the problem directly

• Same can be said for credit card offers via credit scoring

• In general if someone benefits someone loses

• Systematized racism etc.

Monday, February 4, 13
Aside: the death spiral of modeling

• Insurance: pooled risk

• Add segmentation/ good health modeling

• Lose original goal

• People who benefit don’t see the problem directly

• Same can be said for credit card offers via credit scoring

• In general if someone benefits someone loses

• Systematized racism etc.

• Philosophically, what do we want our culture to be?

Monday, February 4, 13
Ex 2b: E-Score
• Name: E-Score (or “buying power” score)

Monday, February 4, 13
Ex 2b: E-Score
• Name: E-Score (or “buying power” score)

• Underlying model: Unknown; takes into account past google searches etc.

Monday, February 4, 13
Ex 2b: E-Score
• Name: E-Score (or “buying power” score)

• Underlying model: Unknown; takes into account past google searches etc.

• Underlying assumptions: Behavior consistent over time, correct ID

Monday, February 4, 13
Ex 2b: E-Score
• Name: E-Score (or “buying power” score)

• Underlying model: Unknown; takes into account past google searches etc.

• Underlying assumptions: Behavior consistent over time, correct ID

• Input/output: Unregulated, could use race, age, whatever

Monday, February 4, 13
Ex 2b: E-Score
• Name: E-Score (or “buying power” score)

• Underlying model: Unknown; takes into account past google searches etc.

• Underlying assumptions: Behavior consistent over time, correct ID

• Input/output: Unregulated, could use race, age, whatever

• Purported/political goal: Measurement/ quantification, skimming $

Monday, February 4, 13
Ex 2b: E-Score
• Name: E-Score (or “buying power” score)

• Underlying model: Unknown; takes into account past google searches etc.

• Underlying assumptions: Behavior consistent over time, correct ID

• Input/output: Unregulated, could use race, age, whatever

• Purported/political goal: Measurement/ quantification, skimming $

• Evaluation method: Death spiral, this time not regulated.

Monday, February 4, 13
Ex 2b: E-Score
• Name: E-Score (or “buying power” score)

• Underlying model: Unknown; takes into account past google searches etc.

• Underlying assumptions: Behavior consistent over time, correct ID

• Input/output: Unregulated, could use race, age, whatever

• Purported/political goal: Measurement/ quantification, skimming $

• Evaluation method: Death spiral, this time not regulated.

• Gaming potential: Not high

Monday, February 4, 13
Ex 2b: E-Score
• Name: E-Score (or “buying power” score)

• Underlying model: Unknown; takes into account past google searches etc.

• Underlying assumptions: Behavior consistent over time, correct ID

• Input/output: Unregulated, could use race, age, whatever

• Purported/political goal: Measurement/ quantification, skimming $

• Evaluation method: Death spiral, this time not regulated.

• Gaming potential: Not high

• Reach: International

Monday, February 4, 13
Ex 3: h-index
• Name: E-Score (or “buying power” score)

Monday, February 4, 13
Ex 3: h-index
• Name: E-Score (or “buying power” score)

• Underlying model: max N where there are N papers with N citations

Monday, February 4, 13
Ex 3: h-index
• Name: E-Score (or “buying power” score)

• Underlying model: max N where there are N papers with N citations

• Underlying assumptions: papers and citations, and quantity, meaningful

Monday, February 4, 13
Ex 3: h-index
• Name: E-Score (or “buying power” score)

• Underlying model: max N where there are N papers with N citations

• Underlying assumptions: papers and citations, and quantity, meaningful

• Input/output: Academic publishing records, single number

Monday, February 4, 13
Ex 3: h-index
• Name: E-Score (or “buying power” score)

• Underlying model: max N where there are N papers with N citations

• Underlying assumptions: papers and citations, and quantity, meaningful

• Input/output: Academic publishing records, single number

• Purported/political goal: Measurement, self-advancement

Monday, February 4, 13
Ex 3: h-index
• Name: E-Score (or “buying power” score)

• Underlying model: max N where there are N papers with N citations

• Underlying assumptions: papers and citations, and quantity, meaningful

• Input/output: Academic publishing records, single number

• Purported/political goal: Measurement, self-advancement

• Evaluation method: Fields vs. not?

Monday, February 4, 13
Ex 3: h-index
• Name: E-Score (or “buying power” score)

• Underlying model: max N where there are N papers with N citations

• Underlying assumptions: papers and citations, and quantity, meaningful

• Input/output: Academic publishing records, single number

• Purported/political goal: Measurement, self-advancement

• Evaluation method: Fields vs. not?

• Gaming potential: Highly vulnerable

Monday, February 4, 13
Ex 3: h-index
• Name: E-Score (or “buying power” score)

• Underlying model: max N where there are N papers with N citations

• Underlying assumptions: papers and citations, and quantity, meaningful

• Input/output: Academic publishing records, single number

• Purported/political goal: Measurement, self-advancement

• Evaluation method: Fields vs. not?

• Gaming potential: Highly vulnerable

• Reach: As far as h-score reaches

Monday, February 4, 13
Others

• Education - who will graduate


• Debt collectors - who will pay
• Political ads - targeting
• Health and DNA models

Monday, February 4, 13
Modeling physics vs. people

• There’s a feedback loop for modeling

Monday, February 4, 13
Modeling physics vs. people

• There’s a feedback loop for modeling


• Sometimes indicates the model is bad

Monday, February 4, 13
Modeling physics vs. people

• There’s a feedback loop for modeling


• Sometimes indicates the model is bad
• “People models” = “statistical models"

Monday, February 4, 13
Where do we go now?

• Defend math

Monday, February 4, 13
Where do we go now?

• Defend math
• First step: educate ourselves

Monday, February 4, 13
Where do we go now?

• Defend math
• First step: educate ourselves
• Referee process for public models

Monday, February 4, 13
Where do we go now?

• Defend math
• First step: educate ourselves
• Referee process for public models
• Require transparent evaluation methods

Monday, February 4, 13
Where do we go now?

• Defend math
• First step: educate ourselves
• Referee process for public models
• Require transparent evaluation methods
• Let’s not become economists though

Monday, February 4, 13

S-ar putea să vă placă și