Your SlideShare is downloading. ×
0
CCLS Internship Presentation
CCLS Internship Presentation
CCLS Internship Presentation
CCLS Internship Presentation
CCLS Internship Presentation
CCLS Internship Presentation
CCLS Internship Presentation
CCLS Internship Presentation
CCLS Internship Presentation
CCLS Internship Presentation
CCLS Internship Presentation
CCLS Internship Presentation
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

CCLS Internship Presentation

1,003

Published on

A presentation about the work me and my fellow intern did during our summer at Columbia University Center for Computational Learning Systems.

A presentation about the work me and my fellow intern did during our summer at Columbia University Center for Computational Learning Systems.

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,003
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
0
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Data Mining and Feeder Attribute Analysis Interns Charles Naut Lawrence Ng Columbia University’s Center for Computational Learning Systems
  • 2. Transformer Attribute Time Series
    • 7 years worth of data from Con Edison databases
    • Attributes for transformers: temperature, phase voltage, and phase load
    • B phase load for one week - 168 data points (one point per hour)
    41.9 61 40.9 60 … … … … 18.9 1 19.2 128 19.3 127 18.6 2 Load Hour
  • 3. Piecewise Aggregate Approximation (PAA ) Symbolic Aggregate Approximation (SAX)
    • PAA 1
      • Time series is divided into equally sized frames
      • Value of a frame is the average of data falling in that frame
      • Reduces dimensionality of time series
      • Lower bounding of Euclidean Distance
    • SAX 2
      • Discretizes time series data
      • PAA values are given symbols based on calculated breakpoints
      • Retains reduced dimensionality of PAA
      • Allows for lower bounding of Euclidean Distance
    1- E. Keogh, J. Lin & A. Fu (2005). HOT SAX: Efficiently Finding the Most Unusual Time Series Subsequence. In Proc. of the 5th IEEE International Conference on Data Mining (ICDM 2005), pp. 226 - 233., Houston, Texas, Nov 27-30, 2005.  2- B. Yi, & C. Faloutsos. Fast time sequence indexing for arbitrary lp norms. In Proc. of the 26th Int'l Conf. on Very Large Databases. pp 385-394, 2000.
  • 4. SAX Conversion Process
    • Steps:
      • Attain raw time series
      • Normalize time series
      • Convert to PAA format
      • Convert to a SAX string
    • Result: cbbbddaa
  • 5. SAX Goals
    • Detect abnormalities among time series data from transformers by comparing differences in SAX strings of baseline data to SAX strings of live data
    • Predict when a transformer will fail by using dynamic (time series) data to indicate how stressed it is
    • B phase loads for two feeders
    • Top: caaadddb - Normal
    • Bottom: cbbbddaa - Failure
  • 6. Tarzan and Hot SAX
    • Methods for finding time series discords
    • Tarzan 3
      • Detects novel time series patterns
      • Novelty based on expected pattern frequency
    • Hot SAX 4
      • Finds patterns most unlike others
      • Aids in clustering and discovery of motifs
    3- S.Lonardi, J. Lin, E. Keogh & B. Chiu (2007). Efficient Discovery of Unusual Patterns in Time Series. Special Issue of New Generation Computing Journal. To Appear. 4- E. Keogh, J. Lin and A. Fu (2005). HOT SAX: Efficiently Finding the Most Unusual Time Series Subsequence. In Proc. of the 5th IEEE International Conference on Data Mining (ICDM 2005), pp. 226 - 233., Houston, Texas, Nov 27-30, 2005. 
  • 7. Feeder Attribute Analysis
    • Determine the similarity of feeders and create a dendrogram displaying the information
    • Paired feeders allow to better study “treatments”, such as Hipots, statistically
    • Build on SQL queries previously written
  • 8. Dendrogram of Distances Between Feeders Based on Feeder Attributes
  • 9. Pairing of Feeders List of feeders and companions based on increasing coverage area.
  • 10. Accomplishments
    • ABF dynamic attribute on feeder outages
    • SAX strings based on time series from transformer RMS data
    • Matlab implementation of Hot SAX algorithm
    • Introduction of a new method for machine learning on feeders and their components.
    • Dendrogram with information of the distance between feeders based on feeder attributes
    • SQL queries creating feeder pairs
  • 11. Growth
    • Gained experience with: Matlab, SQL, Python, R, Unix, and Microsoft Office
    • Acquired new knowledge of SAX, machine learning, data mining, pattern recognition, databases, suffix trees, and time series
    • Learned about the Con Edison Distribution System and its relevance to our work
    • Developed good work habits and communication skills
  • 12. Special Thanks
    • Albert Boulanger
    • Ansaf Salleb-Aouissi
    • Phillip Gross
    • Roger Anderson
    • Leon Bukhman
    • Eugene Klitenik

×