Points of Accumulation in Data sets

Lab 11: Digit Lab

By NIST National Institute of Standards and Technology  

US Postal Service collected handwriting samples in the 1980's to create a computer mail scanning system.

GOAL: Create a feature set to begin recognizing hand written digits.

 

Part I: 1 Feature

Definition: A feature a data set \(\Delta\) is a function \[f:\Delta\to \mathbb{R}\] That is, assign a decimal number to each datum.

 

TASK I:  List a 5 features you can think to use when looking at hand written digits.

  • Example: The number of "Pen Down" blots.


TASK I:  List a 5 features you can think to use when looking at hand written digits.

  • Example: The number of "Strokes".


2 strokes

1 stroke

2 strokes

TASK II:  Create all the digits on paper using your left and and then your right hand (as a group).  

 

TASK III:  

  • Choose 2 features from your group's list
  • Choose 3 digits as a group.  
  • Compute the 2 features of each of the three digits in your hand writing sample and the MNIST sample provided.  
  • Plot them as (feature1, feature2) 

TASK IV:  Identify the points of accumulation in your plots.  Do the individual digits stand apart?

 

Here is a sample of computer's feature extraction of the 60,000 handwriting samples in the full MNIST training set.

Digit Lab: Points of Accumulation

By James Wilson

Digit Lab: Points of Accumulation

  • 8