What is the state of business intelligence tools in Python in 2012? How Python is used for data processing and analysis? Different approaches for business data and scientific data.
Video: https://vimeo.com/53063944
■ Extracting data from the original sources
■ Quality assuring and cleaning data
■ Conforming the labels and measures
in the data to achieve consistency across the original sources
■ Delivering data in a physical format that can be used by
query tools, report writers, and dashboards.
Source: Ralph Kimball – The Data Warehouse ETL Toolkit
Source Staging Area Operational Data Store Datamarts
Systems
structured
documents
databases
Temporary
Staging
Area
APIs
staging relational dimensional
L0 L1 L2
clone schema:
for column in src_table.columns:
target_table.append_column(column.copy())
target_table.create()
copy data:
insert = target_table.insert()
for row in src_table.select().execute():
insert.execute(row)
text file (CSV) to table:
reader = csv.reader(file_stream)
columns = reader.next()
for column in columns:
table.append_column(Column(column, String))
table.create()
for row in reader:
insert.execute(row)
Simple T from ETL
Data Analysis and
Extraction, Transformation, Loading
Sources Presentation
Data Governance
Technologies and Utilities
for row in result.table_rows(“sector”):
row.record["amount_sum"]
q row.label k row.key
whole cube
o cell = Cell(cube)
browser.aggregate(o cell)
Total
browser.aggregate(o cell,
drilldown=[9 “date”])
2006 2007 2008 2009 2010
✂ cut = PointCut(9 “date”, [2010])
o cell = o cell.slice(✂ cut)
browser.aggregate(o cell,
drilldown=[9 “date”])
Jan Feb Mar Apr March April May ...
just the Language
■ saves maintenance resources
■ shortens development time
■ saves your from going insane
Source Staging Area Operational Data Store Datamarts
Systems
structured
documents
databases
faster
Temporary
Staging
Area
APIs
staging relational dimensional
L0 L1 L2
faster advanced
Data Analysis and
Extraction, Transformation, Loading
Sources Presentation
Data Governance
Technologies and Utilities
understandable, maintainable