Data Comes in Shapes
July 16, 5th
Elephant
Tim Poston
Chief Scientist
http://forushealth.com
http://geometeer.com
tim.poston@gmail.com
Mostly numbers.
What are data?
Are numbers only numbers?
Numbers come in patterns:
That is what ‘big data’ is all about.
Patterns are shapes.
Studying data shapes is geometry.
Patterns are shapes.
… but not the geometry of high school.
Studying data shapes is not the geometry of high school.
It is not replacing the 3D minds of children
with flattened (though intricate) teen imagination.
If we have three variables, we have three dimensions.
If we have n variables, we have n dimensions.
To think about n dimensions, we have two choices:
 Practice thinking in 3D
 Turn it all into algebra
We have to do both.
What does a matrix
a b c
c d e
f g h[ ]
even mean?
[ ][] []
A matrix
a b c 1 a
c d e 0 = c
f g h 0 f
describes a transformation
by listing how a few things change.
[ ][] []
A matrix
a b c 0 b
c d e 1 = d
f g h 0 g
describes a transformation
by listing how a few things change.
[ ][] []
A matrix
a b c 0 c
c d e 0 = e
f g h 1 h
describes a transformation
by listing how a few things change.
a b c
c d e
f g h[ ]
is just a list of where (1,0,0), (0,1,0) and (0,0,1) go.
A matrix
Remember that, and you always clarify how the algebra works.
Remember that, and you always clarify how the code should work.
Principal component analysis (PCA)




 



 


just finds a rotation (matrix) so that the data points
lie as close as possible to coordinate axes.
In n dimensions.
The simplex method (“Linear Programming”) looks at
points constrained by inequalities
a1x1 + a2x2 + … + anxn + c ≥ 0
which just means
‘lying on one side of a line/plane/hyperplane, in 2D/3D/nD’.
A convex
polygon/polyhedron/polytope.
The simplex method (“Linear Programming”) looks at
a convex polytope, and seeks
the highest point.
Find a genuine corner
(any corner).
Go up the most vertical edge,
till you meet another face.
Do that again. And again.
And again. And again. And reach the top.
All the matrix ‘pivoting’, degenerate case handling, etc.,
is just implementing that.
(like this from Wikipedia)
tend to skimp on the geometry.
What is a support line / plane /hyperplane?
How do you find one? (Very like simplex method.)
Geometry organises what algebra needs to do.
Algebra (often linear) organises what code needs to do.
Planning code needs algebra, which needs geometry.
Some bugs come from coding wrong.
Some bugs come from coding the wrong algebra.
Some bugs come from algebraising the wrong geometry.
Try to think at all levels!
Thank you!
Tim Poston
http://forushealth.com
http://geometeer.com
tim.poston@gmail.com

Data Comes in Shapes

  • 1.
    Data Comes inShapes July 16, 5th Elephant Tim Poston Chief Scientist http://forushealth.com http://geometeer.com tim.poston@gmail.com
  • 2.
    Mostly numbers. What aredata? Are numbers only numbers? Numbers come in patterns: That is what ‘big data’ is all about. Patterns are shapes.
  • 3.
    Studying data shapesis geometry. Patterns are shapes. … but not the geometry of high school.
  • 4.
    Studying data shapesis not the geometry of high school. It is not replacing the 3D minds of children with flattened (though intricate) teen imagination.
  • 5.
    If we havethree variables, we have three dimensions. If we have n variables, we have n dimensions. To think about n dimensions, we have two choices:  Practice thinking in 3D  Turn it all into algebra We have to do both.
  • 6.
    What does amatrix a b c c d e f g h[ ] even mean?
  • 7.
    [ ][] [] Amatrix a b c 1 a c d e 0 = c f g h 0 f describes a transformation by listing how a few things change.
  • 8.
    [ ][] [] Amatrix a b c 0 b c d e 1 = d f g h 0 g describes a transformation by listing how a few things change.
  • 9.
    [ ][] [] Amatrix a b c 0 c c d e 0 = e f g h 1 h describes a transformation by listing how a few things change.
  • 10.
    a b c cd e f g h[ ] is just a list of where (1,0,0), (0,1,0) and (0,0,1) go. A matrix Remember that, and you always clarify how the algebra works. Remember that, and you always clarify how the code should work.
  • 11.
    Principal component analysis(PCA)              just finds a rotation (matrix) so that the data points lie as close as possible to coordinate axes. In n dimensions.
  • 12.
    The simplex method(“Linear Programming”) looks at points constrained by inequalities a1x1 + a2x2 + … + anxn + c ≥ 0 which just means ‘lying on one side of a line/plane/hyperplane, in 2D/3D/nD’. A convex polygon/polyhedron/polytope.
  • 13.
    The simplex method(“Linear Programming”) looks at a convex polytope, and seeks the highest point. Find a genuine corner (any corner). Go up the most vertical edge, till you meet another face. Do that again. And again. And again. And again. And reach the top. All the matrix ‘pivoting’, degenerate case handling, etc., is just implementing that.
  • 14.
    (like this fromWikipedia) tend to skimp on the geometry. What is a support line / plane /hyperplane? How do you find one? (Very like simplex method.)
  • 15.
    Geometry organises whatalgebra needs to do. Algebra (often linear) organises what code needs to do. Planning code needs algebra, which needs geometry. Some bugs come from coding wrong. Some bugs come from coding the wrong algebra. Some bugs come from algebraising the wrong geometry. Try to think at all levels!
  • 16.