• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Big Data Analysis with Crate and Python
 

Big Data Analysis with Crate and Python

on

  • 125 views

Analysing any huge dataset with the help of the crate datastore using the bare crate python client or SQLAlchemy.

Analysing any huge dataset with the help of the crate datastore using the bare crate python client or SQLAlchemy.

Statistics

Views

Total Views
125
Views on SlideShare
125
Embed Views
0

Actions

Likes
1
Downloads
3
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Big Data Analysis with Crate and Python Big Data Analysis with Crate and Python Presentation Transcript

    • Big Data Analysis with Crate and Python Matthias Wahl - developer @ crate.io ! Email: matthias@crate.io
    • Crate shared nothing massively scalable datastore standing on the shoulders of giants
    • Crate get it at: https://crate.io/download # bash -c "$(curl -L try.crate.io)"
    • Crate automatic sharding and replication (semi-) structured models single table only SQL query language
    • Crate all common SQL types(and more) powerful aggregations (‘GROUP BY’) linear scalability - data and query execution is distributed basic arithmetics (next release 0.39)
    • Crate
    • Aggregation Execution SELECT station_name, max(temp), avg(temp), min(temp), count(distinct date) FROM weather_de WHERE temp != -999 GROUP BY station_name ORDER BY station_name ASC;
    • Aggregation Execution H M M M R R R collect Request
    • Aggregation Execution H M M M R R R collect hash based distribution
    • Aggregation Execution H M M M R R R group results
    • Aggregation Execution H M M M R R R final reduce Response
    • Aggregation Execution
    • Using the python client >>> from crate.client.http import Client >>> client = Client([“127.0.0.1:4200”]) >>> response = client.sql(“select * from weather_de limit 1”) >>> print(response) { u'duration': 659, u'rowcount': 1, u'rows': [ [1303365600000, 82.0, None, None, None, 0, u'954', 54.1667, 7.45, u'UFS Deutsche Bucht', 60.0, 10.9, 100, 5.2] ], u'cols': [u'date', ...] }
    • Using SQLAlchemy >>> import sqlalchemy as sa >>> from sqlalchemy.ext.declarative import declarative_base >>> from sqlalchemy.orm import sessionmaker >>> engine = sa.create_engine(“crate://localhost:4200”) >>> Base = declarative_base()
    • Using SQLAlchemy >>> class Weather(Base): ... ... __tablename__ = 'weather_de' ... ... station_id = Column('station_id', String, primary_key=True) ... station_name = Column('station_name', String) ... station_lat = Column('station_lat', Float) ... station_long = Column('station_lon', Float) ... station_height = Column('station_height', Integer) ... date = Column('date', DateTime, primary_key=True) ... temp = Column('temp', Float) ... humility = Column(Float) ... sunshine_hours = Column(Float) ... wind_speed = Column(Float) ... wind_direction = Column(Integer) ... rainfall_fallen = Column(Integer) ... rainfall_height = Column(Float) ... rainfall_form = Column(Integer)
    • Using SQLAlchemy >>> from sa import func >>> res = DBSession.query( ... Weather.station_name, ... func.avg(Weather.temp) ... ).group_by(Weather.station_name) ... .order_by(Weather.station_name) ... .limit(10).all() SELECT station_name, avg(temp) from weather group by station_name order by station_name limit 10;
    • Using SQLAlchemy #Average sunshine hours from sqlalchemy.sql import func DBSession.query(func.avg(Weather.sunshine_hours)).scalar() # Average sunshine hours in Konstanz DBSession.query(func.avg(Weather.sunshine_hours)).filter(Weather.station_name== ‘Konstanz’).scalar()
    • Feature Requests I’m no data scientist
    • Feature Requests Please tell us what you would like to see in crate. I’m no data scientist
    • CRATE Thank you web: https://crate.io/ github: https://github.com/crate twitter: @cratedata IRC: #crate stackoverflow tag: cratedata