Big Data Analytics with Google BigQuery. By Javier Ramirez. All your base Conference 2014

•Download as ODP, PDF•

3 likes•1,610 views

At teowaki we are using Google BigQuery to get analytics from our API usage. Learn how you can benefit from BigQuery to have Big Data analytics.

Software

Big Data Analytics
with Google BigQuery
javier ramirez
@supercoco9

REST API
+
AngularJS web as
an API client
javier ramirez @supercoco9 https://teowaki.com nosqlmatters 2013

javier ramirez @supercoco9 https://teowaki.com

bigdata is doing a fullscan
to 330MM rows, matching
them against a regexp, and
getting the result (223MM
rows) in just 5 seconds
javier ramirez @supercoco9 https://teowaki.com
Javier Ramirez
impresionable teowaki founder

Apache Hadoop
Apache Cassandra
Apache Spark
Apache Storm
Amazon Redshift
javier ramirez @supercoco9 https://teowaki.com

bigdata is cool but...
expensive cluster
hard to set up and monitor
not interactive enough

Our choice:
Google BigQuery
Data analysis as a service
http://developers.google.com/bigquery
javier ramirez @supercoco9 https://teowaki.com

Based on Dremel
Specifically designed for
interactive queries over
petabytes of real-time data
javier ramirez @supercoco9 https://teowaki.com

What Dremel is used for in Google
• Analysis of crawled web documents.
• Tracking install data for applications on Android Market.
• Crash reporting for Google products.
• OCR results from Google Books.
• Spam analysis.
• Debugging of map tiles on Google Maps.
• Tablet migrations in managed Bigtable instances.
• Results of tests run on Google’s distributed build system.
• Disk I/O statistics for hundreds of thousands of disks.
• Resource monitoring for jobs run in Google’s data centers.
• Symbols and dependencies in Google’s codebase.

in BigQuery
everything is
a full-scan*
*Over a ridiculously fast distributed filesystem.
Dremel design goal: 1TB/sec. It was exceeded
BigQuery delivers ~ 50Gb/Sec.
javier ramirez @supercoco9 https://teowaki.com

Columnar
storage
javier ramirez @supercoco9 https://teowaki.com

highly distributed
execution using a tree
javier ramirez @supercoco9 https://teowaki.com rubyc kiev 14

loading data
You can feed flat CSV-like
files or nested JSON objects
javier ramirez @supercoco9 https://teowaki.com

bq cli
bq load --nosynchronous_mode
--encoding UTF-8
--field_delimiter 'tab'
--max_bad_records 100
--source_format CSV
api.stats 20131014T11-42-
05Z.gz
javier ramirez @supercoco9 https://teowaki.com

web console screenshot
javier ramirez @supercoco9 https://teowaki.com

analytical SQL functions.
correlations.
window functions.
views.
JSON fields.
timestamped tables.
javier ramirez @supercoco9 https://teowaki.com

Things you always wanted to
try but were too scared to
select count(*) from
publicdata:samples.wikipedia
where REGEXP_MATCH(title, "[0-9]*")
AND wp_namespace = 0;
223,163,387
Query complete (5.6s elapsed, 9.13 GB processed, Cost: 32¢)
javier ramirez @supercoco9 https://teowaki.com

Global Database of Events,
Language and Tone
quarter billion rows
30 years
updated daily
http://gdeltproject.org/data.html#googlebigquery

SELECT Year, Actor1Name, Actor2Name, Count FROM (
SELECT Actor1Name, Actor2Name, Year,
COUNT(*) Count, RANK() OVER(PARTITION BY YEAR ORDER BY
Count DESC) rank
FROM
(SELECT Actor1Name, Actor2Name, Year FROM
[gdelt-bq:full.events] WHERE Actor1Name < Actor2Name
and Actor1CountryCode != '' and Actor2CountryCode != ''
and Actor1CountryCode!=Actor2CountryCode),
(SELECT Actor2Name Actor1Name, Actor1Name Actor2Name,
Year FROM [gdelt-bq:full.events] WHERE
Actor1Name > Actor2Name and Actor1CountryCode != '' and
Actor2CountryCode != '' and
Actor1CountryCode!=Actor2CountryCode),
WHERE Actor1Name IS NOT null
AND Actor2Name IS NOT null
GROUP EACH BY 1, 2, 3
HAVING Count > 100
)
WHERE rank=1
ORDER BY Year

Automation with Apps Script
Read from bigquery
Create a spreadsheet on Drive
E-mail it everyday as a PDF
javier ramirez @supercoco9 https://teowaki.com

Analysing weather information
Finding patterns in e-commerce
Match online/offline behaviour
Log analysys
Analysing inventory/booking data
...

bigquery pricing
$80 per stored TB
1000000 rows => $0.02288 / month
$35 per processed TB
1 full scan ~ 240 MB
1 count = 0 MB
1 full scan over 1 column ~ 13 MB
10 GB => $0.35 / month
*the 1st TB processed every month is free of charge
javier ramirez @supercoco9 https://teowaki.com

Find related links at
https://teowaki.com/teams/javier-community/link-categories/bigquery-talk
Thanks
Javier Ramírez
@supercoco9

An short introduction on Big Query. With this presentation you'll quickly discover : How load data in BigQuery How to build dashboard using BigQuery How to work with BigQuery and, at last but not least, we've added some best practices We hope you'll enjoy this presentation and that it will help you to start exploring this wonderful solution. Don't hesitate to send us your feedbacks or questions

Migration to ClickHouse. Practical guide, by Alexander Zaitsev

Altinity Ltd

Use case and integration of ClickHouse with Apache Superset & Dremio

Altinity Ltd

Come to this session to learn how Amazon DynamoDB was built as the hyper-scale database for internet-scale applications. In January 2012, Amazon launched DynamoDB, a cloud-based NoSQL database service designed from the ground up to support extreme scale, with the security, availability, performance, and manageability needed to run mission-critical workloads. This session discloses for the first time the underpinnings of DynamoDB, and how we run a fully managed nonrelational database used by more than 100,000 customers. We cover the underlying technical aspects of how an application works with DynamoDB for authentication, metadata, storage nodes, streams, backup, and global replication.

Cassandra Data Modeling - Practical Considerations @ Netflix

nkorla1share

Better than you think: Handling JSON data in ClickHouse

Altinity Ltd

Robert Hodges shows how ClickHouse, a relational database with tables, can offer high-performance analysis of JSON data. This talk provides a cookbook of schema design, indexing, data loading, and query tricks we gave learned over years of helping users build analytical apps for servicds logs, observability data, financial transactions, and other types of semi-structured data. Robert Hodges is CEO of Altinity and a certified database geek. https://altinity.com https://www.meetup.com/San-Francisco-Bay-Area-ClickHouse-Meetup

High Performance, High Reliability Data Loading on ClickHouse

Altinity Ltd

Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop

huguk

At Google Cloud Platform, we're combining the Apache Spark and Hadoop ecosystem with our software and hardware innovations. We want to make these awesome tools easier, faster, and more cost-effective, from 3 to 30,000 cores. This presentation will showcase how Google Cloud Platform is innovating with the goal of bringing the Hadoop ecosystem to everyone. Bio: "I love data because it surrounds us - everything is data. I also love open source software, because it shows what is possible when people come together to solve common problems with technology. While they are awesome on their own, I am passionate about combining the power of open source software with the potential unlimited uses of data. That's why I joined Google. I am a product manager for Google Cloud Platform and manage Cloud Dataproc and Apache Beam (incubating). I've previously spent time hanging out at Disney and Amazon. Beyond Google, love data, amateur radio, Disneyland, photography, running and Legos."

MongoDB WiredTiger Internals: Journey To Transactions

Mydbops

MongoDB has adapted transaction feature (ACID Properties) in MongoDB 4.0. This talk focuses on the internals of how MongoDB adapted the ACID properties with Weird Tiger Engine. Weird tiger offers more future possibilities for MongoDB. This tech talk was presented at Mydbops Database Meetup on 27-04-2019 by Manosh Malai Senior Devops/NoSQL Consultant with Mydbops and Ranjith Database Administrator with Mydbops.

ProxySQL in the Cloud

René Cannaò

MongoDB at Scale

MongoDB

Spark 의 핵심은 무엇인가? RDD! (RDD paper review)

Yongho Ha

Adventures with the ClickHouse ReplacingMergeTree Engine

Altinity Ltd

An overview of BigQuery

GirdhareeSaran

How Discord Migrated Trillions of Messages from Cassandra to ScyllaDB

ScyllaDB

Learn why and how Discord’s persistence team recently completed their most ambitious migration yet: moving their massive set of trillions of messages from Cassandra to ScyllaDB. Bo Ingram, Senior Software Engineer at Discord, provides a technical look, including: - Their reasons for moving from Apache Cassandra to ScyllaDB - Their strategy for migrating trillions of messages - How they designed a new storage topology – using a hybrid-RAID1 architecture – for extremely low latency on GCP - The role of their existing Rust messages service, new Rust data service library, and new Rust data migrator in this project - What they’ve achieved so far, lessons learned, and what they’re tackling next

AWS DynamoDB

Suman Debnath

BigQuery implementation

Simon Su

Prestogres internals

Sadayuki Furuhashi

Building Your First App with MongoDBMongoDB

BigQuery walk through.pptx

VikRam S

All about Zookeeper and ClickHouse Keeper.pdf

Altinity Ltd

ClickHouse clusters depend on ZooKeeper to handle replication and distributed DDL commands. In this Altinity webinar, we’ll explain why ZooKeeper is necessary, how it works, and introduce the new built-in replacement named ClickHouse Keeper. You’ll learn practical tips to care for ZooKeeper in sickness and health. You’ll also learn how/when to use ClickHouse Keeper. We will share our recommendations for keeping that happy as well.

Arise startups selling arise roby

Arise Roby

Sm Consulting Services Expertise

nasiryasin

What's hot

Redshift VS BigQuery

Kostas Pardalis

Bigquery와 airflow를 이용한 데이터 분석 시스템 구축 v1 나무기술(주) 최유석 20170912

Yooseok Choi

Amazon DynamoDB Under the Hood: How We Built a Hyper-Scale Database (DAT321) ...

Amazon Web Services

Cassandra Data Modeling - Practical Considerations @ Netflix

nkorla1share

Better than you think: Handling JSON data in ClickHouse

Altinity Ltd

High Performance, High Reliability Data Loading on ClickHouse

Altinity Ltd

Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop

huguk

MongoDB WiredTiger Internals: Journey To Transactions

Mydbops

ProxySQL in the Cloud

René Cannaò

MongoDB at Scale

MongoDB

Spark 의 핵심은 무엇인가? RDD! (RDD paper review)

Yongho Ha

Adventures with the ClickHouse ReplacingMergeTree Engine

Altinity Ltd

An overview of BigQuery

GirdhareeSaran

How Discord Migrated Trillions of Messages from Cassandra to ScyllaDB

ScyllaDB

AWS DynamoDB

Suman Debnath

BigQuery implementation

Simon Su

Prestogres internals

Sadayuki Furuhashi

Building Your First App with MongoDBMongoDB

BigQuery walk through.pptx

VikRam S

All about Zookeeper and ClickHouse Keeper.pdf

Altinity Ltd

What's hot (20)

Redshift VS BigQuery

Bigquery와 airflow를 이용한 데이터 분석 시스템 구축 v1 나무기술(주) 최유석 20170912

Amazon DynamoDB Under the Hood: How We Built a Hyper-Scale Database (DAT321) ...

Cassandra Data Modeling - Practical Considerations @ Netflix

Better than you think: Handling JSON data in ClickHouse

High Performance, High Reliability Data Loading on ClickHouse

Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop

MongoDB WiredTiger Internals: Journey To Transactions

ProxySQL in the Cloud

MongoDB at Scale

Spark 의 핵심은 무엇인가? RDD! (RDD paper review)

Adventures with the ClickHouse ReplacingMergeTree Engine

An overview of BigQuery

How Discord Migrated Trillions of Messages from Cassandra to ScyllaDB

AWS DynamoDB

BigQuery implementation

Prestogres internals

Building Your First App with MongoDB

BigQuery walk through.pptx

All about Zookeeper and ClickHouse Keeper.pdf

Viewers also liked

Arise startups selling arise roby

Arise Roby

Sm Consulting Services Expertise

nasiryasin

500Startups Discussion - Startup Selling: Conquering the EnterpriseShervin Talieh

Crash course on delivering value to a startup

Firewerks

Strategy Consulting Services

ProfoundHealthcareServices

API analytics with Redis and Google Bigquery. NoSQL matters edition

javier ramirez

At teowaki we have a system for API use analytics using Redis as a fast intermediate store and bigquery as a big data backend. As a result, we can launch aggregated queries on our traffic/usage data in a few seconds and we can try and find for usage patterns that wouldn’t be obvious otherwise. In this session I will speak of the alternatives we evaluated and how we are using Redis and Bigquery to solve our problem.

Column Stores and Google BigQuery

Csaba Toth

Workshop 20140522 BigQuery Implementation

Simon Su

Get more from Analytics with Google BigQuery - Javier Ramirez - Datawaki- BBVACI

javier ramirez

You might be paying too much for BigQuery

Ryuji Tamagawa

My Talk at GCPUG-Taiwan on 2015/5/8. You use BigQuery with SQL, but the internal work of BigQuery is very different from traditional Relational Database systems you may familiar with. One of the way to understand how BigQuery works is to see it from the cost you pay for BigQuery. Knowing how to save money while using BigQuery is to know how BigQuery works to some extent. In this session, let’s talk about practical knowledge (saving money) and exciting technology (how BigQuery works)!

Crunching Data with Google BigQuery. JORDAN TIGANI at Big Data Spain 2012

Big Data Spain

How BigQuery broke my heart

Gabriel Hamilton

Thoughtapult Corporate Presentation

Thoughtapult

Based out of Bangalore, Thoughtapult was incorporated in July 2013 with a highly qualified team averaging more than 25+ years of experience across different industry verticals and leading Organizations through all stages of a business. With expertise in business strategy, sales, marketing, technology, project management, process & operations, monetizing new business and finance, our team comes with the vision of Fostering Innovation, Excellence & Growth towards the Realization of your Dreams

MindSphere Corporate Presentation

MindSphere Consulting Private Limited

About A Professional Mentoring, Advisory, Business Consulting & Investment Banking Firm focused on providing mentoring and management solutions to startups and young entrepreneurs across industries. Mission “Ignite & Mentor Young Business Minds to build an idea from its inception or journey, nurture and grow it, to a size that would have impact in the space that it was created to be in. Creating Value for their customer eco systems and Wealth for their stake holders” Vision “Ignite – Emerge – Transform - Excel” MindSphere differentiates itself as a company that doesn’t just provide solutions to its clients’ immediate problems but believes in engaging with the client and enabling it as a self-sustaining entity. We partner with our clients in all its meaning of partnership and define, redefine and fine-tune a business into one that is capable of finding solutions to all its challenges of the future. This happens when we work with you towards gaining fresh perspective to the challenges and also finding an insight of all your capabilities and hence, creating a system that learns and thrives in all situations, with or without a mentor Founded by Sanjay Prasad a Serial Entrepreneur with 27+ rich years of experience and Four successful ventures from inception to successful acquisition to his credit His last venture MindRiver which was in the IT Services space grew from a 4 People startup to a 400+ Strong enterprise that was acquired by Acropetal within 7 years of its inception and grew share holder wealth by 23 times. Specialties Mentoring for Startup's & Young Entrepreneurs, Management Consulting, Business Consulting, Strategic Consulting, Advisory & Investment Banking

Exploring Open Date with BigQuery: Jenny Tong

Future Insights

RoIT Consulting Company Services Presentation

RoIT Consulting

GDD Brazil 2010 - Google Storage, Bigquery and Prediction APIs

Patrick Chanezon

Google is expanding our storage products by introducing Google Storage for Developers. It offers a RESTful API for storing and accessing data at Google. Developers can take advantage of the performance and reliability of Google's storage infrastructure, as well as the advanced security and sharing capabilities. We will demonstrate key functionality of the product as well as customer use cases. Google relies heavily on data analysis and has developed many tools to understand large datasets. Two of these tools are now available on a limited sign-up basis to developers: (1) BigQuery: interactive analysis of very large data sets and (2) Prediction API: make informed predictions from your data. We will demonstrate their use and give instructions on how to get access.

Business Plans and Consulting Services

Graphic Memory Internet Services, Inc.

Extentia - Amplifi Startup Consulting

Extentia Information Technology

Infosys Consulting - IT ConsultingErick Prajogo

Viewers also liked (20)

Arise startups selling arise roby

Sm Consulting Services Expertise

500Startups Discussion - Startup Selling: Conquering the Enterprise

Crash course on delivering value to a startup

Strategy Consulting Services

API analytics with Redis and Google Bigquery. NoSQL matters edition

Column Stores and Google BigQuery

Workshop 20140522 BigQuery Implementation

Get more from Analytics with Google BigQuery - Javier Ramirez - Datawaki- BBVACI

You might be paying too much for BigQuery

Crunching Data with Google BigQuery. JORDAN TIGANI at Big Data Spain 2012

How BigQuery broke my heart

Thoughtapult Corporate Presentation

MindSphere Corporate Presentation

Exploring Open Date with BigQuery: Jenny Tong

RoIT Consulting Company Services Presentation

GDD Brazil 2010 - Google Storage, Bigquery and Prediction APIs

Business Plans and Consulting Services

Extentia - Amplifi Startup Consulting

Infosys Consulting - IT Consulting

Similar to Big Data Analytics with Google BigQuery. By Javier Ramirez. All your base Conference 2014

Big Data Analytics with Google BigQuery. GDG Summit Spain 2014javier ramirez

Big Data Analytics with Google BigQuery, by Javier Ramirez, datawaki, at Span...

javier ramirez

Big Data analytics with Nginx, Logstash, Redis, Google Bigquery and Neo4j, ja...

javier ramirez

Bigdata for small pockets, by Javier Ramirez from teowaki. RubyC Kiev 2014

javier ramirez

This is the story of how https://teowaki.com added bigdata analytics using a very economic approach. Bigdata is amazing. You can get insights from your users, find interesting patterns and have lots of geek fun. Problem is big data usually means many servers, a complex set up, intensive monitoring and a steep learning curve. All those things cost money. If you don't have the money, you are losing all the fun. In my talk I will show you how you can use Redis, Google Bigquery and Apps Script to manage big data from your application for under $1 per month. Don't you feel like running a RegExp over 300 million rows in just 5 seconds?

api analytics redis bigquery. Lrug

javier ramirez

At teowaki we have a system for API usage analytics, with Redis as a fast intermediate store and bigquery as a big data backend. As a result, we can launch aggregated queries on our traffic/usage data in just a few seconds and we can try and find for usage patterns that wouldn’t be obvious otherwise. In this session I will talk about how we entered the Big Data world, which alternatives we evaluated, and how we are using Redis and Bigquery to solve our problem.

API Analytics with Redis and Bigquery. NoSQLmatters Cologne '14 edition. Javi...

javier ramirez

At teowaki we have a system for API usage analytics, with Redis as a fast intermediate store and bigquery as a big data backend. As a result, we can launch aggregated queries on our traffic/usage data in just a few seconds and we can try and find for usage patterns that wouldn’t be obvious otherwise.In this session I will talk about how we entered the Big Data world, which alternatives we evaluated, and how we are using Redis and Bigquery to solve our problem.

How we are using BigQuery and Apps Scripts at teowaki

javier ramirez

Mongodb beijingconf yottaa_3.3

Yottaa

Big Data with BigQuery, presented at DevoxxUK 2014 by Javier Ramirez from teo...

javier ramirez

Big data is amazing. You can get insights from your users, find interesting patterns and have lots of geek fun. Problem is big data usually means many servers, a complex set up, intensive monitoring and a steep learning curve. All those things cost money. If you don’t have the money, you are losing all the fun. In my talk I show you how you can use Google BigQuery to manage big data from your application using a hosted solution. And you can start with less than $1 per month.

Perchè potresti aver bisogno di un database NoSQL anche se non sei Google o F...

Codemotion

[convergese] Adaptive Images in Responsive Web DesignChristopher Schmitt

Dok Talks #124 - Intro to Druid on Kubernetes

DoKC

Link to the full talk - https://youtu.be/2Rf5t2Eh6IQ https://go.dok.community/slack https://dok.community ABSTRACT OF THE TALK This talk will provide a high-level overview of Kubernetes, Helm charts and how they can be used to deploy Apache Druid clusters of any size. We'll review how Kubernetes functionality enables resilience and self-healing, historical tiers through node group affinity, middle manager scaling through Kubernetes autoscaling to optimize ingestion capacity and some of the gotchas along the way. BIO Sergio Ferragut is a database veteran turned Developer Advocate at Imply. His experience includes 16 years at Teradata in professional services and engineering roles. He has direct experience in building analytics applications spanning the retail, supply chain, pricing optimization and IoT spaces. Sergio has worked at multiple technology start-ups including APL and Splice Machine where he helped guide product design and field messaging.

Fun with ruby and redis, arrrrcamp edition, javier_ramirez, teowaki

javier ramirez

How you can benefit from using Redis - Ramirez

Codemotion

Socket applicationsJoão Moura

Future of Development and Deployment using Docker

Tamer Abdul-Radi

Analyzing the Performance of Mobile Web

Ariya Hidayat

Por que deberias haberle pedido redis a los reyes magos

javier ramirez

Economies of Scaling Software

Joshua Long

Matheus Marsiglio - Isomorphic React + Redux App

React Conf Brasil

Apresentado na React Conf Brasil, em São Paulo, 7 de Outubro de 2017 #reactconfbr Programador Nutella, cheguei na WEB quando HTML era 5, mas React era mato, escrevo coisas em Javascript para resolver problemas que você provavelmente nem sabe que tem e em outras linguagens para resolver problemas que eu tenho. https://github.com/mtmr0x @mtmr0x - Patrocínio: Pipefy, Globo.com, Meteor, Apollo, Taller, Fullcircle, Quanto, Udacity, Cubos, Segware, Entria - Apoio: Concrete, Rung, LuizaLabs, Movile, Rivendel, GreenMile, STQ, Hi Platform - Promoção: InfoQ, DevNaEstrada, CodamosClub, JS Ladies, NodeBR, Training Center, BrazilJS, Tableless, GeekHunter - Afterparty: An English Thing

Similar to Big Data Analytics with Google BigQuery. By Javier Ramirez. All your base Conference 2014 (20)

Big Data Analytics with Google BigQuery. GDG Summit Spain 2014

Big Data Analytics with Google BigQuery, by Javier Ramirez, datawaki, at Span...

Big Data analytics with Nginx, Logstash, Redis, Google Bigquery and Neo4j, ja...

Bigdata for small pockets, by Javier Ramirez from teowaki. RubyC Kiev 2014

api analytics redis bigquery. Lrug

API Analytics with Redis and Bigquery. NoSQLmatters Cologne '14 edition. Javi...

How we are using BigQuery and Apps Scripts at teowaki

Mongodb beijingconf yottaa_3.3

Big Data with BigQuery, presented at DevoxxUK 2014 by Javier Ramirez from teo...

Perchè potresti aver bisogno di un database NoSQL anche se non sei Google o F...

[convergese] Adaptive Images in Responsive Web Design

Dok Talks #124 - Intro to Druid on Kubernetes

Fun with ruby and redis, arrrrcamp edition, javier_ramirez, teowaki

How you can benefit from using Redis - Ramirez

Socket applications

Future of Development and Deployment using Docker

Analyzing the Performance of Mobile Web

Por que deberias haberle pedido redis a los reyes magos

Economies of Scaling Software

Matheus Marsiglio - Isomorphic React + Redux App

More from javier ramirez

¿Se puede vivir del open source? T3chfest

javier ramirez

Hubo un tiempo en el que casi cualquier componente de software requería pagar una licencia. Afortunadamente, hoy en día gracias al software libre y de código abierto, se puede desarrollar prácticamente cualquier aplicación usando componentes gratuitos. Pero, si el software es gratis, ¿Quién lo desarrolla? ¿Trabaja la comunidad de software libre de forma altruista? ¿Se puede desarrollar software libre de forma profesional? De hecho, hay quien dice que el código abierto tal y como lo conocimos ya no existe, y que lo que hay hoy en día es otra cosa. En esta charla hablaré de cómo se puede monetizar el código libre, y de algunos posibles conflictos que puedes encontrarte en el camino. Además, te contaré cómo hacemos desde QuestDB para desarrollar una base de datos de código abierto y mantener un equipo estable viviendo de ello. Comentaré también algunas situaciones problemáticas a las que proyectos muy destacados se han enfrentado, o que se enfrentan a día de hoy.

QuestDB: The building blocks of a fast open-source time-series database

javier ramirez

(talk delivered at OSA CON 23) Traditionally, databases have treated timestamps just as another data type. However, when performing real-time analytics, timestamps should be first class citizens and we need rich time semantics to get the most out of our data. We also need to deal with ever growing datasets while keeping performant, which is as fun as it sounds. It is no wonder time-series databases are now more popular than ever before. Join me in this session to learn about the internal architecture and building blocks of QuestDB, an open source time-series database designed for speed. We will learn how it deals with data ingestion, and which SQL extensions it implements for working with time-series efficiently. We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or data deduplication.

Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...

javier ramirez

QuestDB es una base de datos open source de alto rendimiento. Mucha gente nos comentaba que les gustaría usarla como servicio, sin tener que gestionar las máquinas. Así que nos pusimos manos a la obra para desarrollar una solución que nos permitiese lanzar instancias de QuestDB con provisionado, monitorización, seguridad o actualizaciones totalmente gestionadas. Unos cuantos clusters de Kubernetes más tarde, conseguimos lanzar nuestra oferta de QuestDB Cloud. Esta charla es la historia de cómo llegamos ahí. Hablaré de herramientas como Calico, Karpenter, CoreDNS, Telegraf, Prometheus, Loki o Grafana, pero también de retos como autenticación, facturación, multi-nube, o de a qué tienes que decir que no para poder sobrevivir en la nube.

Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...

javier ramirez

How would you build a database to support sustained ingestion of several hundreds of thousands rows per second while running near real-time queries on top? In this session I will go over some of the technical decisions and trade-offs we applied when building QuestDB, an open source time-series database developed mainly in JAVA, and how we can achieve over four million row writes per second on a single instance without blocking or slowing down the reads. There will be code and demos, of course. We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or faster batch ingestion.

Deduplicating and analysing time-series data with Apache Beam and QuestDB

javier ramirez

Time series data pipelines tend to prioritise speed and freshness over completeness and integrity. In such scenarios, it is very common to ingest duplicate data, which may be fine for many analytical use cases, but is very inconvenient for others. There are many open source databases built specifically for the speed and query semantics of time series, and most of them lack automatic deduplication of events in near real-time. One such database is QuestDB, which requires a manual batch process to deduplicate ingested data. In this talk, we will see how we can successfully use Apache Beam to deduplicate streaming time series, which can then be analysed by a time series database.

Your Database Cannot Do this (well)

javier ramirez

Relational databases were created a long time ago for a simpler world. Even if they are still awesome tools for generic workloads, there are some things they cannot do well. In this session I will speak about purpose-built databases that you can use for specific business scenarios. We will see the type of queries you can run on a Graph database, a Document Database, and a Time-Series database. We will then see how a relational database could also be used for the same use cases, just in a much more complex way.

Your Timestamps Deserve Better than a Generic Database

javier ramirez

If you are storing records with a timestamp in your database, it is very likely a time series database can make your life easier. However, time series databases are still the great unknown for a large part of the tech community. In this talk, I will show you what use cases they are good for, what they give you that you cannot get from a traditional database, and when it is a good idea (and when it is not) to use them. For the demos, we will be using QuestDB, the fastest open-source time series database.

Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...

javier ramirez

En esta sesión voy a contar las decisiones técnicas que tomamos al desarrollar QuestDB, una base de datos Open Source para series temporales compatible con Postgres, y cómo conseguimos escribir más de cuatro millones de filas por segundo sin bloquear o enlentecer las consultas. Hablaré de cosas como (zero) Garbage Collection, vectorización de instrucciones usando SIMD, reescribir en lugar de reutilizar para arañar microsegundos, aprovecharse de los avances en procesadores, discos duros y sistemas operativos, como por ejemplo el soporte de io_uring, o del balance entre experiencia de usuario y rendimiento cuando se plantean nuevas funcionalidades.

QuestDB-Community-Call-20220728

javier ramirez

Processing and analysing streaming data with Python. Pycon Italy 2022

javier ramirez

QuestDB: ingesting a million time series per second on a single instance. Big...

javier ramirez

Servicios e infraestructura de AWS y la próxima región en Aragón

javier ramirez

Primeros pasos en desarrollo serverless

javier ramirez

¿Qué es eso del desarrollo sin servidores? ¿Qué lenguajes puedo utilizar? ¿Cómo hago cosas como autenticación, o guardar en base de datos, o enviar notificaciones? ¿Esto escala? A todas estas preguntas, y a alguna más, intentaré dar respuesta en esta sesión, donde haré una pequeña demo de montar una app muy sencilla y desplegarla en la nube sin preocuparnos de gestionar infraestructura. Charla realizada por primera vez para AlcarriaConf 2021

How AWS is reinventing the cloud

javier ramirez

Analitica de datos en tiempo real con Apache Flink y Apache BEAM

javier ramirez

Trabajar en tiempo real con datos que se mueven muy rápido no es trivial, sobre todo con volúmenes de datos elevados. Apache Flink y Apache BEAM están específicamente diseñadas para ese caso de uso. En esta charla te contaré los retos de la analítica en tiempo real, cuál es la arquitectura de Apache Flink, qué es Apace BEAM, y cómo usan estas herramientas empresas para hacer desde procesos triviales hasta gestionar billones de eventos al día con latencias de milisegundos. Por supuesto, haremos una demo :)

Getting started with streaming analytics

javier ramirez

In this webinar we explain which are some of the problems of streaming analytics, and why they are different to batch/big data analytics. Then we go into introducing some basic streaming concepts, like event queues, event processors, event vs processing time, and delivery guarantees. We end this first part of the series presenting a few of the most common open source components for streaming (Kafka, Spark, Flink, Cassandra, or ElasticSearch) and we mention the different options you have to run them on AWS.

Getting started with streaming analytics: Setting up a pipeline

javier ramirez

In this session I will show you how to create a simple streaming analytics pipeline, first using open source tools and developing locally, then moving to a VM, then moving to fully managed AWS services. The session will serve as an introduction to some details of Apache Kafka, Apache Flink, ElasticSearch, Amazon Managed Streaming for Kafka, Kinesis Data Analytics, and Amazon ElasticSearch. It will be an almost slideless presentation, as I will spent most of the time at the command line and the IDE.

Getting started with streaming analytics: Deep Dive

javier ramirez

Getting started with streaming analytics: streaming basics (1 of 3)

javier ramirez

Monitorización de seguridad y detección de amenazas con AWS

javier ramirez

La seguridad es nuestra prioridad número uno. Cuando despliegas tu infraestructura y aplicaciones en la nube, hay que tener en cuenta que muchas de las prácticas de seguridad son iguales a las que se llevan a cabo tradicionalmente cuando trabajas on-premises, pero hay otros mecanismos que son específicos a AWS y que te ayudan a operar de forma segura. En este webinar, vamos a explicarte las bases de la monitorización de seguridad y de la detección de amenazas, y veremos cómo servicios como Amazon GuardDuty y AWS Security Hub te ayudan a tener una visión completa, te permiten cumplir con tus requisitos de compliance, y te permiten detectar amenazas en tus cargas de trabajo.

More from javier ramirez (20)

¿Se puede vivir del open source? T3chfest

QuestDB: The building blocks of a fast open-source time-series database

Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...

Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...

Deduplicating and analysing time-series data with Apache Beam and QuestDB

Your Database Cannot Do this (well)

Your Timestamps Deserve Better than a Generic Database

Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...

QuestDB-Community-Call-20220728

Processing and analysing streaming data with Python. Pycon Italy 2022

QuestDB: ingesting a million time series per second on a single instance. Big...

Servicios e infraestructura de AWS y la próxima región en Aragón

Primeros pasos en desarrollo serverless

How AWS is reinventing the cloud

Analitica de datos en tiempo real con Apache Flink y Apache BEAM

Getting started with streaming analytics

Getting started with streaming analytics: Setting up a pipeline

Getting started with streaming analytics: Deep Dive

Getting started with streaming analytics: streaming basics (1 of 3)

Monitorización de seguridad y detección de amenazas con AWS

Recently uploaded

Globus Compute Introduction - GlobusWorld 2024

Globus

De mooiste recreatieve routes ontdekken met RouteYou en FME

Jelle | Nordend

TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR

Tier1 app

Even though at surface level ‘java.lang.OutOfMemoryError’ appears as one single error; underlyingly there are 9 types of OutOfMemoryError. Each type of OutOfMemoryError has different causes, diagnosis approaches and solutions. This session equips you with the knowledge, tools, and techniques needed to troubleshoot and conquer OutOfMemoryError in all its forms, ensuring smoother, more efficient Java applications.

Globus Compute wth IRI Workflows - GlobusWorld 2024

Globus

As part of the DOE Integrated Research Infrastructure (IRI) program, NERSC at Lawrence Berkeley National Lab and ALCF at Argonne National Lab are working closely with General Atomics on accelerating the computing requirements of the DIII-D experiment. As part of the work the team is investigating ways to speedup the time to solution for many different parts of the DIII-D workflow including how they run jobs on HPC systems. One of these routes is looking at Globus Compute as a way to replace the current method for managing tasks and we describe a brief proof of concept showing how Globus Compute could help to schedule jobs and be a tool to connect compute at different facilities.

Multiple Your Crypto Portfolio with the Innovative Features of Advanced Crypt...

Hivelance Technology

Cryptocurrency trading bots are computer programs designed to automate buying, selling, and managing cryptocurrency transactions. These bots utilize advanced algorithms and machine learning techniques to analyze market data, identify trading opportunities, and execute trades on behalf of their users. By automating the decision-making process, crypto trading bots can react to market changes faster than human traders Hivelance, a leading provider of cryptocurrency trading bot development services, stands out as the premier choice for crypto traders and developers. Hivelance boasts a team of seasoned cryptocurrency experts and software engineers who deeply understand the crypto market and the latest trends in automated trading, Hivelance leverages the latest technologies and tools in the industry, including advanced AI and machine learning algorithms, to create highly efficient and adaptable crypto trading bots

First Steps with Globus Compute Multi-User Endpoints

Globus

In this presentation we will share our experiences around getting started with the Globus Compute multi-user endpoint. Working with the Pharmacology group at the University of Auckland, we have previously written an application using Globus Compute that can offload computationally expensive steps in the researcher's workflows, which they wish to manage from their familiar Windows environments, onto the NeSI (New Zealand eScience Infrastructure) cluster. Some of the challenges we have encountered were that each researcher had to set up and manage their own single-user globus compute endpoint and that the workloads had varying resource requirements (CPUs, memory and wall time) between different runs. We hope that the multi-user endpoint will help to address these challenges and share an update on our progress here.

Software Testing Exam imp Ques Notes.pdf

MayankTawar1

Why React Native as a Strategic Advantage for Startup Innovation.pdf

ayushiqss

Do you know that React Native is being increasingly adopted by startups as well as big companies in the mobile app development industry? Big names like Facebook, Instagram, and Pinterest have already integrated this robust open-source framework. In fact, according to a report by Statista, the number of React Native developers has been steadily increasing over the years, reaching an estimated 1.9 million by the end of 2024. This means that the demand for this framework in the job market has been growing making it a valuable skill. But what makes React Native so popular for mobile application development? It offers excellent cross-platform capabilities among other benefits. This way, with React Native, developers can write code once and run it on both iOS and Android devices thus saving time and resources leading to shorter development cycles hence faster time-to-market for your app. Let’s take the example of a startup, which wanted to release their app on both iOS and Android at once. Through the use of React Native they managed to create an app and bring it into the market within a very short period. This helped them gain an advantage over their competitors because they had access to a large user base who were able to generate revenue quickly for them.

Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...

Shahin Sheidaei

Games are powerful teaching tools, fostering hands-on engagement and fun. But they require careful consideration to succeed. Join me to explore factors in running and selecting games, ensuring they serve as effective teaching tools. Learn to maintain focus on learning objectives while playing, and how to measure the ROI of gaming in education. Discover strategies for pitching gaming to leadership. This session offers insights, tips, and examples for coaches, team leads, and enterprise leaders seeking to teach from simple to complex concepts.

Providing Globus Services to Users of JASMIN for Environmental Data Analysis

Globus

JASMIN is the UK’s high-performance data analysis platform for environmental science, operated by STFC on behalf of the UK Natural Environment Research Council (NERC). In addition to its role in hosting the CEDA Archive (NERC’s long-term repository for climate, atmospheric science & Earth observation data in the UK), JASMIN provides a collaborative platform to a community of around 2,000 scientists in the UK and beyond, providing nearly 400 environmental science projects with working space, compute resources and tools to facilitate their work. High-performance data transfer into and out of JASMIN has always been a key feature, with many scientists bringing model outputs from supercomputers elsewhere in the UK, to analyse against observational or other model data in the CEDA Archive. A growing number of JASMIN users are now realising the benefits of using the Globus service to provide reliable and efficient data movement and other tasks in this and other contexts. Further use cases involve long-distance (intercontinental) transfers to and from JASMIN, and collecting results from a mobile atmospheric radar system, pushing data to JASMIN via a lightweight Globus deployment. We provide details of how Globus fits into our current infrastructure, our experience of the recent migration to GCSv5.4, and of our interest in developing use of the wider ecosystem of Globus services for the benefit of our user community.

Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...

Globus

Large Language Models (LLMs) are currently the center of attention in the tech world, particularly for their potential to advance research. In this presentation, we'll explore a straightforward and effective method for quickly initiating inference runs on supercomputers using the vLLM tool with Globus Compute, specifically on the Polaris system at ALCF. We'll begin by briefly discussing the popularity and applications of LLMs in various fields. Following this, we will introduce the vLLM tool, and explain how it integrates with Globus Compute to efficiently manage LLM operations on Polaris. Attendees will learn the practical aspects of setting up and remotely triggering LLMs from local machines, focusing on ease of use and efficiency. This talk is ideal for researchers and practitioners looking to leverage the power of LLMs in their work, offering a clear guide to harnessing supercomputing resources for quick and effective LLM inference.

WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation

WSO2

Large Language Models and the End of Programming

Matt Welsh

Explore Modern SharePoint Templates for 2024

Sharepoint Designs

Cracking the code review at SpringIO 2024

Paco van Beckhoven

Code reviews are vital for ensuring good code quality. They serve as one of our last lines of defense against bugs and subpar code reaching production. Yet, they often turn into annoying tasks riddled with frustration, hostility, unclear feedback and lack of standards. How can we improve this crucial process? In this session we will cover: - The Art of Effective Code Reviews - Streamlining the Review Process - Elevating Reviews with Automated Tools By the end of this presentation, you'll have the knowledge on how to organize and improve your code review proces

Visitor Management System in India- Vizman.app

NaapbooksPrivateLimi

Your Digital Assistant. Making complex approach simple. Straightforward process saves time. No more waiting to connect with people that matter to you. Safety first is not a cliché - Securely protect information in cloud storage to prevent any third party from accessing data. Would you rather make your visitors feel burdened by making them wait? Or choose VizMan for a stress-free experience? VizMan is an automated visitor management system that works for any industries not limited to factories, societies, government institutes, and warehouses. A new age contactless way of logging information of visitors, employees, packages, and vehicles. VizMan is a digital logbook so it deters unnecessary use of paper or space since there is no requirement of bundles of registers that is left to collect dust in a corner of a room. Visitor’s essential details, helps in scheduling meetings for visitors and employees, and assists in supervising the attendance of the employees. With VizMan, visitors don’t need to wait for hours in long queues. VizMan handles visitors with the value they deserve because we know time is important to you. Feasible Features One Subscription, Four Modules – Admin, Employee, Receptionist, and Gatekeeper ensures confidentiality and prevents data from being manipulated User Friendly – can be easily used on Android, iOS, and Web Interface Multiple Accessibility – Log in through any device from any place at any time One app for all industries – a Visitor Management System that works for any organisation. Stress-free Sign-up Visitor is registered and checked-in by the Receptionist Host gets a notification, where they opt to Approve the meeting Host notifies the Receptionist of the end of the meeting Visitor is checked-out by the Receptionist Host enters notes and remarks of the meeting Customizable Components Scheduling Meetings – Host can invite visitors for meetings and also approve, reject and reschedule meetings Single/Bulk invites – Invitations can be sent individually to a visitor or collectively to many visitors VIP Visitors – Additional security of data for VIP visitors to avoid misuse of information Courier Management – Keeps a check on deliveries like commodities being delivered in and out of establishments Alerts & Notifications – Get notified on SMS, email, and application Parking Management – Manage availability of parking space Individual log-in – Every user has their own log-in id Visitor/Meeting Analytics – Evaluate notes and remarks of the meeting stored in the system Visitor Management System is a secure and user friendly database manager that records, filters, tracks the visitors to your organization. "Secure Your Premises with VizMan (VMS) – Get It Now"

Accelerate Enterprise Software Engineering with Platformless

WSO2

Key takeaways: Challenges of building platforms and the benefits of platformless. Key principles of platformless, including API-first, cloud-native middleware, platform engineering, and developer experience. How Choreo enables the platformless experience. How key concepts like application architecture, domain-driven design, zero trust, and cell-based architecture are inherently a part of Choreo. Demo of an end-to-end app built and deployed on Choreo.

BoxLang: Review our Visionary Licenses of 2024

Ortus Solutions, Corp

In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...

Juraj Vysvader

top nidhi software solution freedownload

vrstrong314

This presentation emphasizes the importance of data security and legal compliance for Nidhi companies in India. It highlights how online Nidhi software solutions, like Vector Nidhi Software, offer advanced features tailored to these needs. Key aspects include encryption, access controls, and audit trails to ensure data security. The software complies with regulatory guidelines from the MCA and RBI and adheres to Nidhi Rules, 2014. With customizable, user-friendly interfaces and real-time features, these Nidhi software solutions enhance efficiency, support growth, and provide exceptional member services. The presentation concludes with contact information for further inquiries.

Recently uploaded (20)

Globus Compute Introduction - GlobusWorld 2024

De mooiste recreatieve routes ontdekken met RouteYou en FME

TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR

Globus Compute wth IRI Workflows - GlobusWorld 2024

Multiple Your Crypto Portfolio with the Innovative Features of Advanced Crypt...

First Steps with Globus Compute Multi-User Endpoints

Software Testing Exam imp Ques Notes.pdf

Why React Native as a Strategic Advantage for Startup Innovation.pdf

Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...

Providing Globus Services to Users of JASMIN for Environmental Data Analysis

Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...

WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation

Large Language Models and the End of Programming

Explore Modern SharePoint Templates for 2024

Cracking the code review at SpringIO 2024

Visitor Management System in India- Vizman.app

Accelerate Enterprise Software Engineering with Platformless

BoxLang: Review our Visionary Licenses of 2024

In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...

top nidhi software solution freedownload

Big Data Analytics with Google BigQuery. By Javier Ramirez. All your base Conference 2014

1. Big Data Analytics with Google BigQuery javier ramirez @supercoco9

2. REST API + AngularJS web as an API client javier ramirez @supercoco9 https://teowaki.com nosqlmatters 2013

3. javier ramirez @supercoco9 https://teowaki.com

4. bigdata is doing a fullscan to 330MM rows, matching them against a regexp, and getting the result (223MM rows) in just 5 seconds javier ramirez @supercoco9 https://teowaki.com Javier Ramirez impresionable teowaki founder

5. Apache Hadoop Apache Cassandra Apache Spark Apache Storm Amazon Redshift javier ramirez @supercoco9 https://teowaki.com

6. bigdata is cool but... expensive cluster hard to set up and monitor not interactive enough

7. Our choice: Google BigQuery Data analysis as a service http://developers.google.com/bigquery javier ramirez @supercoco9 https://teowaki.com

8. Based on Dremel Specifically designed for interactive queries over petabytes of real-time data javier ramirez @supercoco9 https://teowaki.com

9. What Dremel is used for in Google • Analysis of crawled web documents. • Tracking install data for applications on Android Market. • Crash reporting for Google products. • OCR results from Google Books. • Spam analysis. • Debugging of map tiles on Google Maps. • Tablet migrations in managed Bigtable instances. • Results of tests run on Google’s distributed build system. • Disk I/O statistics for hundreds of thousands of disks. • Resource monitoring for jobs run in Google’s data centers. • Symbols and dependencies in Google’s codebase.

10. in BigQuery everything is a full-scan* *Over a ridiculously fast distributed filesystem. Dremel design goal: 1TB/sec. It was exceeded BigQuery delivers ~ 50Gb/Sec. javier ramirez @supercoco9 https://teowaki.com

11. Columnar storage javier ramirez @supercoco9 https://teowaki.com

12. highly distributed execution using a tree javier ramirez @supercoco9 https://teowaki.com rubyc kiev 14

13. loading data You can feed flat CSV-like files or nested JSON objects javier ramirez @supercoco9 https://teowaki.com

14. bq cli bq load --nosynchronous_mode --encoding UTF-8 --field_delimiter 'tab' --max_bad_records 100 --source_format CSV api.stats 20131014T11-42- 05Z.gz javier ramirez @supercoco9 https://teowaki.com

15. web console screenshot javier ramirez @supercoco9 https://teowaki.com

16. analytical SQL functions. correlations. window functions. views. JSON fields. timestamped tables. javier ramirez @supercoco9 https://teowaki.com

17. Things you always wanted to try but were too scared to select count(*) from publicdata:samples.wikipedia where REGEXP_MATCH(title, "[0-9]*") AND wp_namespace = 0; 223,163,387 Query complete (5.6s elapsed, 9.13 GB processed, Cost: 32¢) javier ramirez @supercoco9 https://teowaki.com

18.

19. Global Database of Events, Language and Tone quarter billion rows 30 years updated daily http://gdeltproject.org/data.html#googlebigquery

20. SELECT Year, Actor1Name, Actor2Name, Count FROM ( SELECT Actor1Name, Actor2Name, Year, COUNT(*) Count, RANK() OVER(PARTITION BY YEAR ORDER BY Count DESC) rank FROM (SELECT Actor1Name, Actor2Name, Year FROM [gdelt-bq:full.events] WHERE Actor1Name < Actor2Name and Actor1CountryCode != '' and Actor2CountryCode != '' and Actor1CountryCode!=Actor2CountryCode), (SELECT Actor2Name Actor1Name, Actor1Name Actor2Name, Year FROM [gdelt-bq:full.events] WHERE Actor1Name > Actor2Name and Actor1CountryCode != '' and Actor2CountryCode != '' and Actor1CountryCode!=Actor2CountryCode), WHERE Actor1Name IS NOT null AND Actor2Name IS NOT null GROUP EACH BY 1, 2, 3 HAVING Count > 100 ) WHERE rank=1 ORDER BY Year

21.

22. javier ramirez @supercoco9 https://teowaki.com

23. Automation with Apps Script Read from bigquery Create a spreadsheet on Drive E-mail it everyday as a PDF javier ramirez @supercoco9 https://teowaki.com

24.

25. what is it being used for?

26.

27.

28. Analysing weather information Finding patterns in e-commerce Match online/offline behaviour Log analysys Analysing inventory/booking data ...

29. bigquery pricing $80 per stored TB 1000000 rows => $0.02288 / month $35 per processed TB 1 full scan ~ 240 MB 1 count = 0 MB 1 full scan over 1 column ~ 13 MB 10 GB => $0.35 / month *the 1st TB processed every month is free of charge javier ramirez @supercoco9 https://teowaki.com

30. Find related links at https://teowaki.com/teams/javier-community/link-categories/bigquery-talk Thanks Javier Ramírez @supercoco9

Editor's Notes

nadie duda de que tu api sea técnicamente muy buena, pero...
conclusión obvia esto va a ser un problema de big data el problema es que nosotros no sabíamos de big data. Nos sonaba map/reduce, hadoop, cassandra.. pero nos faltaban datos
esto hace dos años era imposible. vivimos en el futuro para poder ejecutar consultas sobre nuestros datos, el primer punto es extraerlos de nuestro sistema, que en nuestro caso significa extraer la información de las peticiones del usuario conforme van pasando
Apache Drill es el equivalente en open source. No funciona como servicio. bigquery es un recubrimiento REST encima de dremel. Usable desde cualquier plataforma que permita REST. Apis disponibles para diferentes lenguajes Solamente para inserciones! no borrados o updates.A menudo junto Map/reduce o hadoop. Análisis in place, sin carga previa, sin índices ni planificar las queries de antemano
next: full scan regexp
Column data is of uniform type; therefore, there are some opportunities for storage size optimizations available in column-oriented data that are not available in row-oriented data. also less I/O Además Dremel proporciona una estructura en árbol para lanzar las queries
batch y tiempo real tanto en la entrada de datos (ficheros o stream) como en la salida (interactivo o batch) pagas por lo que usas
batch y tiempo real tanto en la entrada de datos (ficheros o stream) como en la salida (interactivo o batch) pagas por lo que usas
la carga puede ser de fichero plano (tsv para evitar problemas de comillas) o con json si necesitas estructura. Importar desde consola web, REST o command line se pueden importar ficheros comprimidos también se puede importar información para tiempo real en modo stream NEXT: CONCEPTOS DE BIGQUERY
web console api rest command line Notice the validate button to avoid expenses
next: full scan regexp
total 313,797,035
global database of events, language and tone quarter billion rows 30 years updated daily
Column data is of uniform type; therefore, there are some opportunities for storage size optimizations available in column-oriented data that are not available in row-oriented data. also less I/O Además Dremel proporciona una estructura en árbol para lanzar las queries
Column data is of uniform type; therefore, there are some opportunities for storage size optimizations available in column-oriented data that are not available in row-oriented data. also less I/O Además Dremel proporciona una estructura en árbol para lanzar las queries

Big Data Analytics with Google BigQuery. By Javier Ramirez. All your base Conference 2014

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Big Data Analytics with Google BigQuery. By Javier Ramirez. All your base Conference 2014

Similar to Big Data Analytics with Google BigQuery. By Javier Ramirez. All your base Conference 2014 (20)

More from javier ramirez

More from javier ramirez (20)

Recently uploaded

Recently uploaded (20)

Big Data Analytics with Google BigQuery. By Javier Ramirez. All your base Conference 2014

Editor's Notes