Getting To Know Any Dataset
In 4 Lines Of Python
A Dataset And a Person Walk Into a Bar
A Dataset And a Person Walk Into a Bar
What's the Problem?
Time Consumming
Its Manual and Not Fun
Error Prone
Many Skills Required
Data Profiling
Definition
Data Profiling Steps
1.Managing the input.
2.Performing the computation.
3.Managing the output.
Data Profiling Steps
1.Managing the input.
2.Performing the computation.
3.Managing the output.
Data Profiling Steps
1.Managing the input.
2.Performing the computation.
3.Managing the output.
Note: Pandas allow to read into memory different dataset
types
like Excel, Feather, CSV,Parquet, Databases and More
CLICK FOR
MAGIC
No Silver Bullet
• Pandas profiling SHINES on non-nested datasets, can be fixed by Flatten
nested datasets.
• Pandas requires a lot of RAM, can be fixed by Sa`mpling.
• Input should be successfully Loaded.
• Pandas profiling doesn't support all profile tasks. :(
• Pandas profiling treat ordinal columns as categorical. :(
Honorable Mention
Packagespydqc:
• Data summary report for table.
• Summarize difference between two data tables (useful for
comparing training set with test set, comparing the same
data
table from two different snapshot dates, etc.)
missingno:
• Missing data visualization module for Python.
Data Profiling
In Theory
Additional Resources
Getting to know any dataset

Getting to know any dataset

Editor's Notes

  • #5 cowsay -f dragon "Terminal Is Fun And Colorful" | lolcat
  • #6 cowsay -f dragon "Terminal Is Fun And Colorful" | lolcat
  • #7 cowsay -f dragon "Terminal Is Fun And Colorful" | lolcat
  • #8 cowsay -f dragon "Terminal Is Fun And Colorful" | lolcat
  • #9 cowsay -f dragon "Terminal Is Fun And Colorful" | lolcat
  • #11 cowsay -f dragon "Terminal Is Fun And Colorful" | lolcat
  • #12 cowsay -f dragon "Terminal Is Fun And Colorful" | lolcat