Big Data Analysis with Crate and Python

1,220 views

Published on

Analysing any huge dataset with the help of the crate datastore using the bare crate python client or SQLAlchemy.

Published in: Data & Analytics
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,220
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
12
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Big Data Analysis with Crate and Python

  1. 1. Big Data Analysis with Crate and Python Matthias Wahl - developer @ crate.io ! Email: matthias@crate.io
  2. 2. Crate shared nothing massively scalable datastore standing on the shoulders of giants
  3. 3. Crate get it at: https://crate.io/download # bash -c "$(curl -L try.crate.io)"
  4. 4. Crate automatic sharding and replication (semi-) structured models single table only SQL query language
  5. 5. Crate all common SQL types(and more) powerful aggregations (‘GROUP BY’) linear scalability - data and query execution is distributed basic arithmetics (next release 0.39)
  6. 6. Crate
  7. 7. Aggregation Execution SELECT station_name, max(temp), avg(temp), min(temp), count(distinct date) FROM weather_de WHERE temp != -999 GROUP BY station_name ORDER BY station_name ASC;
  8. 8. Aggregation Execution H M M M R R R collect Request
  9. 9. Aggregation Execution H M M M R R R collect hash based distribution
  10. 10. Aggregation Execution H M M M R R R group results
  11. 11. Aggregation Execution H M M M R R R final reduce Response
  12. 12. Aggregation Execution
  13. 13. Using the python client >>> from crate.client.http import Client >>> client = Client([“127.0.0.1:4200”]) >>> response = client.sql(“select * from weather_de limit 1”) >>> print(response) { u'duration': 659, u'rowcount': 1, u'rows': [ [1303365600000, 82.0, None, None, None, 0, u'954', 54.1667, 7.45, u'UFS Deutsche Bucht', 60.0, 10.9, 100, 5.2] ], u'cols': [u'date', ...] }
  14. 14. Using SQLAlchemy >>> import sqlalchemy as sa >>> from sqlalchemy.ext.declarative import declarative_base >>> from sqlalchemy.orm import sessionmaker >>> engine = sa.create_engine(“crate://localhost:4200”) >>> Base = declarative_base()
  15. 15. Using SQLAlchemy >>> class Weather(Base): ... ... __tablename__ = 'weather_de' ... ... station_id = Column('station_id', String, primary_key=True) ... station_name = Column('station_name', String) ... station_lat = Column('station_lat', Float) ... station_long = Column('station_lon', Float) ... station_height = Column('station_height', Integer) ... date = Column('date', DateTime, primary_key=True) ... temp = Column('temp', Float) ... humility = Column(Float) ... sunshine_hours = Column(Float) ... wind_speed = Column(Float) ... wind_direction = Column(Integer) ... rainfall_fallen = Column(Integer) ... rainfall_height = Column(Float) ... rainfall_form = Column(Integer)
  16. 16. Using SQLAlchemy >>> from sa import func >>> res = DBSession.query( ... Weather.station_name, ... func.avg(Weather.temp) ... ).group_by(Weather.station_name) ... .order_by(Weather.station_name) ... .limit(10).all() SELECT station_name, avg(temp) from weather group by station_name order by station_name limit 10;
  17. 17. Using SQLAlchemy #Average sunshine hours from sqlalchemy.sql import func DBSession.query(func.avg(Weather.sunshine_hours)).scalar() # Average sunshine hours in Konstanz DBSession.query(func.avg(Weather.sunshine_hours)).filter(Weather.station_name== ‘Konstanz’).scalar()
  18. 18. Feature Requests I’m no data scientist
  19. 19. Feature Requests Please tell us what you would like to see in crate. I’m no data scientist
  20. 20. CRATE Thank you web: https://crate.io/ github: https://github.com/crate twitter: @cratedata IRC: #crate stackoverflow tag: cratedata

×