Your SlideShare is downloading. ×
0
Big Data Analysis with
Crate and Python
Matthias Wahl - developer @ crate.io
!
Email: matthias@crate.io
Crate
shared nothing massively scalable datastore
standing on the shoulders of giants
Crate
get it at: https://crate.io/download
# bash -c "$(curl -L try.crate.io)"
Crate
automatic sharding and replication
(semi-) structured models
single table only
SQL query language
Crate
all common SQL types(and more)
powerful aggregations (‘GROUP BY’)
linear scalability - data and query execution is
d...
Crate
Aggregation Execution
SELECT
station_name,
max(temp),
avg(temp),
min(temp),
count(distinct date)
FROM
weather_de
WHERE
tem...
Aggregation Execution
H
M
M
M
R
R
R
collect
Request
Aggregation Execution
H
M
M
M
R
R
R
collect
hash based
distribution
Aggregation Execution
H
M
M
M
R
R
R
group results
Aggregation Execution
H
M
M
M
R
R
R
final reduce
Response
Aggregation Execution
Using the python client
>>> from crate.client.http import Client
>>> client = Client([“127.0.0.1:4200”])
>>> response = cl...
Using SQLAlchemy
>>> import sqlalchemy as sa
>>> from sqlalchemy.ext.declarative import
declarative_base
>>> from sqlalche...
Using SQLAlchemy
>>> class Weather(Base):
...
... __tablename__ = 'weather_de'
...
... station_id = Column('station_id', S...
Using SQLAlchemy
>>> from sa import func
>>> res = DBSession.query(
... Weather.station_name,
... func.avg(Weather.temp)
....
Using SQLAlchemy
#Average sunshine hours
from sqlalchemy.sql import func
DBSession.query(func.avg(Weather.sunshine_hours))...
Feature Requests
I’m no data scientist
Feature Requests
Please tell us what you would like to see in
crate.
I’m no data scientist
CRATE
Thank you
web: https://crate.io/
github: https://github.com/crate
twitter: @cratedata
IRC: #crate
stackoverflow tag:...
Upcoming SlideShare
Loading in...5
×

Big Data Analysis with Crate and Python

574

Published on

Analysing any huge dataset with the help of the crate datastore using the bare crate python client or SQLAlchemy.

Published in: Data & Analytics
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
574
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
9
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Transcript of "Big Data Analysis with Crate and Python"

  1. 1. Big Data Analysis with Crate and Python Matthias Wahl - developer @ crate.io ! Email: matthias@crate.io
  2. 2. Crate shared nothing massively scalable datastore standing on the shoulders of giants
  3. 3. Crate get it at: https://crate.io/download # bash -c "$(curl -L try.crate.io)"
  4. 4. Crate automatic sharding and replication (semi-) structured models single table only SQL query language
  5. 5. Crate all common SQL types(and more) powerful aggregations (‘GROUP BY’) linear scalability - data and query execution is distributed basic arithmetics (next release 0.39)
  6. 6. Crate
  7. 7. Aggregation Execution SELECT station_name, max(temp), avg(temp), min(temp), count(distinct date) FROM weather_de WHERE temp != -999 GROUP BY station_name ORDER BY station_name ASC;
  8. 8. Aggregation Execution H M M M R R R collect Request
  9. 9. Aggregation Execution H M M M R R R collect hash based distribution
  10. 10. Aggregation Execution H M M M R R R group results
  11. 11. Aggregation Execution H M M M R R R final reduce Response
  12. 12. Aggregation Execution
  13. 13. Using the python client >>> from crate.client.http import Client >>> client = Client([“127.0.0.1:4200”]) >>> response = client.sql(“select * from weather_de limit 1”) >>> print(response) { u'duration': 659, u'rowcount': 1, u'rows': [ [1303365600000, 82.0, None, None, None, 0, u'954', 54.1667, 7.45, u'UFS Deutsche Bucht', 60.0, 10.9, 100, 5.2] ], u'cols': [u'date', ...] }
  14. 14. Using SQLAlchemy >>> import sqlalchemy as sa >>> from sqlalchemy.ext.declarative import declarative_base >>> from sqlalchemy.orm import sessionmaker >>> engine = sa.create_engine(“crate://localhost:4200”) >>> Base = declarative_base()
  15. 15. Using SQLAlchemy >>> class Weather(Base): ... ... __tablename__ = 'weather_de' ... ... station_id = Column('station_id', String, primary_key=True) ... station_name = Column('station_name', String) ... station_lat = Column('station_lat', Float) ... station_long = Column('station_lon', Float) ... station_height = Column('station_height', Integer) ... date = Column('date', DateTime, primary_key=True) ... temp = Column('temp', Float) ... humility = Column(Float) ... sunshine_hours = Column(Float) ... wind_speed = Column(Float) ... wind_direction = Column(Integer) ... rainfall_fallen = Column(Integer) ... rainfall_height = Column(Float) ... rainfall_form = Column(Integer)
  16. 16. Using SQLAlchemy >>> from sa import func >>> res = DBSession.query( ... Weather.station_name, ... func.avg(Weather.temp) ... ).group_by(Weather.station_name) ... .order_by(Weather.station_name) ... .limit(10).all() SELECT station_name, avg(temp) from weather group by station_name order by station_name limit 10;
  17. 17. Using SQLAlchemy #Average sunshine hours from sqlalchemy.sql import func DBSession.query(func.avg(Weather.sunshine_hours)).scalar() # Average sunshine hours in Konstanz DBSession.query(func.avg(Weather.sunshine_hours)).filter(Weather.station_name== ‘Konstanz’).scalar()
  18. 18. Feature Requests I’m no data scientist
  19. 19. Feature Requests Please tell us what you would like to see in crate. I’m no data scientist
  20. 20. CRATE Thank you web: https://crate.io/ github: https://github.com/crate twitter: @cratedata IRC: #crate stackoverflow tag: cratedata
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×