Introduction to Google BigQuery. Slides used at the first GDG Cloud meetup in Brussels, about big data on Google Cloud Platform. (http://www.meetup.com/GDG-Cloud-Belgium/events/228206131)
The 'macro view' on Big Query:
We started with an overview, some typical uses and moved to project hierarchy, access control and security.
In the end we touch about tools and demos.
Introduction to our Datawarehouse solutions called BigQuery.
The Google Cloud Platform products are based on our internal systems which are powering Google AdWords, Search, YouTube and our leading research in the field of real-time data analysis.
You can get access ($300 for 60 days) to our free trial through google.com/cloud
Basic concepts, best practices, pricing of using BigQuery the analytic data platform at petabyte scale from Google Cloud Platform. There is a lot things to learn about this tool and its features such as BI engine and AI Platform.
An short introduction on Big Query. With this presentation you'll quickly discover :
How load data in BigQuery
How to build dashboard using BigQuery
How to work with BigQuery
and, at last but not least, we've added some best practices
We hope you'll enjoy this presentation and that it will help you to start exploring this wonderful solution. Don't hesitate to send us your feedbacks or questions
Introduction to Google BigQuery. Slides used at the first GDG Cloud meetup in Brussels, about big data on Google Cloud Platform. (http://www.meetup.com/GDG-Cloud-Belgium/events/228206131)
The 'macro view' on Big Query:
We started with an overview, some typical uses and moved to project hierarchy, access control and security.
In the end we touch about tools and demos.
Introduction to our Datawarehouse solutions called BigQuery.
The Google Cloud Platform products are based on our internal systems which are powering Google AdWords, Search, YouTube and our leading research in the field of real-time data analysis.
You can get access ($300 for 60 days) to our free trial through google.com/cloud
Basic concepts, best practices, pricing of using BigQuery the analytic data platform at petabyte scale from Google Cloud Platform. There is a lot things to learn about this tool and its features such as BI engine and AI Platform.
An short introduction on Big Query. With this presentation you'll quickly discover :
How load data in BigQuery
How to build dashboard using BigQuery
How to work with BigQuery
and, at last but not least, we've added some best practices
We hope you'll enjoy this presentation and that it will help you to start exploring this wonderful solution. Don't hesitate to send us your feedbacks or questions
in this presentation we go through the differences and similarities between Redshift and BigQuery. It was presented during the Athens Big Data meetup May 2017.
Amazon DynamoDB Under the Hood: How We Built a Hyper-Scale Database (DAT321) ...Amazon Web Services
Come to this session to learn how Amazon DynamoDB was built as the hyper-scale database for internet-scale applications. In January 2012, Amazon launched DynamoDB, a cloud-based NoSQL database service designed from the ground up to support extreme scale, with the security, availability, performance, and manageability needed to run mission-critical workloads. This session discloses for the first time the underpinnings of DynamoDB, and how we run a fully managed nonrelational database used by more than 100,000 customers. We cover the underlying technical aspects of how an application works with DynamoDB for authentication, metadata, storage nodes, streams, backup, and global replication.
Cassandra Data Modeling - Practical Considerations @ Netflixnkorla1share
Cassandra community has consistently requested that we cover C* schema design concepts. This presentation goes in depth on the following topics:
- Schema design
- Best Practices
- Capacity Planning
- Real World Examples
Better than you think: Handling JSON data in ClickHouseAltinity Ltd
Robert Hodges shows how ClickHouse, a relational database with tables, can offer high-performance analysis of JSON data. This talk provides a cookbook of schema design, indexing, data loading, and query tricks we gave learned over years of helping users build analytical apps for servicds logs, observability data, financial transactions, and other types of semi-structured data. Robert Hodges is CEO of Altinity and a certified database geek.
https://altinity.com
https://www.meetup.com/San-Francisco-Bay-Area-ClickHouse-Meetup
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoophuguk
At Google Cloud Platform, we're combining the Apache Spark and Hadoop ecosystem with our software and hardware innovations. We want to make these awesome tools easier, faster, and more cost-effective, from 3 to 30,000 cores. This presentation will showcase how Google Cloud Platform is innovating with the goal of bringing the Hadoop ecosystem to everyone.
Bio: "I love data because it surrounds us - everything is data. I also love open source software, because it shows what is possible when people come together to solve common problems with technology. While they are awesome on their own, I am passionate about combining the power of open source software with the potential unlimited uses of data. That's why I joined Google. I am a product manager for Google Cloud Platform and manage Cloud Dataproc and Apache Beam (incubating). I've previously spent time hanging out at Disney and Amazon. Beyond Google, love data, amateur radio, Disneyland, photography, running and Legos."
MongoDB WiredTiger Internals: Journey To TransactionsMydbops
MongoDB has adapted transaction feature (ACID Properties) in MongoDB 4.0. This talk focuses on the internals of how MongoDB adapted the ACID properties with Weird Tiger Engine. Weird tiger offers more future possibilities for MongoDB. This tech talk was presented at Mydbops Database Meetup on 27-04-2019 by Manosh Malai Senior Devops/NoSQL Consultant with Mydbops and Ranjith Database Administrator with Mydbops.
This presentation describes how to configure and leverage ProxySQL with
AWS Aurora,
Azure Database for MySQL
and CloudSQL for MySQL.
It details the various benefits, configuration, and monitoring.
Spark 의 핵심은 무엇인가? RDD! (RDD paper review)Yongho Ha
요즘 Hadoop 보다 더 뜨고 있는 Spark.
그 Spark의 핵심을 이해하기 위해서는 핵심 자료구조인 Resilient Distributed Datasets (RDD)를 이해하는 것이 필요합니다.
RDD가 어떻게 동작하는지, 원 논문을 리뷰하며 살펴보도록 합시다.
http://www.cs.berkeley.edu/~matei/papers/2012/sigmod_shark_demo.pdf
Adventures with the ClickHouse ReplacingMergeTree EngineAltinity Ltd
Presentation on ReplacingMergeTree by Robert Hodges of Altinity at the 14 December 2022 SF Bay Area ClickHouse Meetup (https://www.meetup.com/san-francisco-bay-area-clickhouse-meetup/events/289605843/)
How Discord Migrated Trillions of Messages from Cassandra to ScyllaDBScyllaDB
Learn why and how Discord’s persistence team recently completed their most ambitious migration yet: moving their massive set of trillions of messages from Cassandra to ScyllaDB. Bo Ingram, Senior Software Engineer at Discord, provides a technical look, including:
- Their reasons for moving from Apache Cassandra to ScyllaDB
- Their strategy for migrating trillions of messages
- How they designed a new storage topology – using a hybrid-RAID1 architecture – for extremely low latency on GCP
- The role of their existing Rust messages service, new Rust data service library, and new Rust data migrator in this project
- What they’ve achieved so far, lessons learned, and what they’re tackling next
All about Zookeeper and ClickHouse Keeper.pdfAltinity Ltd
ClickHouse clusters depend on ZooKeeper to handle replication and distributed DDL commands. In this Altinity webinar, we’ll explain why ZooKeeper is necessary, how it works, and introduce the new built-in replacement named ClickHouse Keeper. You’ll learn practical tips to care for ZooKeeper in sickness and health. You’ll also learn how/when to use ClickHouse Keeper. We will share our recommendations for keeping that happy as well.
in this presentation we go through the differences and similarities between Redshift and BigQuery. It was presented during the Athens Big Data meetup May 2017.
Amazon DynamoDB Under the Hood: How We Built a Hyper-Scale Database (DAT321) ...Amazon Web Services
Come to this session to learn how Amazon DynamoDB was built as the hyper-scale database for internet-scale applications. In January 2012, Amazon launched DynamoDB, a cloud-based NoSQL database service designed from the ground up to support extreme scale, with the security, availability, performance, and manageability needed to run mission-critical workloads. This session discloses for the first time the underpinnings of DynamoDB, and how we run a fully managed nonrelational database used by more than 100,000 customers. We cover the underlying technical aspects of how an application works with DynamoDB for authentication, metadata, storage nodes, streams, backup, and global replication.
Cassandra Data Modeling - Practical Considerations @ Netflixnkorla1share
Cassandra community has consistently requested that we cover C* schema design concepts. This presentation goes in depth on the following topics:
- Schema design
- Best Practices
- Capacity Planning
- Real World Examples
Better than you think: Handling JSON data in ClickHouseAltinity Ltd
Robert Hodges shows how ClickHouse, a relational database with tables, can offer high-performance analysis of JSON data. This talk provides a cookbook of schema design, indexing, data loading, and query tricks we gave learned over years of helping users build analytical apps for servicds logs, observability data, financial transactions, and other types of semi-structured data. Robert Hodges is CEO of Altinity and a certified database geek.
https://altinity.com
https://www.meetup.com/San-Francisco-Bay-Area-ClickHouse-Meetup
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoophuguk
At Google Cloud Platform, we're combining the Apache Spark and Hadoop ecosystem with our software and hardware innovations. We want to make these awesome tools easier, faster, and more cost-effective, from 3 to 30,000 cores. This presentation will showcase how Google Cloud Platform is innovating with the goal of bringing the Hadoop ecosystem to everyone.
Bio: "I love data because it surrounds us - everything is data. I also love open source software, because it shows what is possible when people come together to solve common problems with technology. While they are awesome on their own, I am passionate about combining the power of open source software with the potential unlimited uses of data. That's why I joined Google. I am a product manager for Google Cloud Platform and manage Cloud Dataproc and Apache Beam (incubating). I've previously spent time hanging out at Disney and Amazon. Beyond Google, love data, amateur radio, Disneyland, photography, running and Legos."
MongoDB WiredTiger Internals: Journey To TransactionsMydbops
MongoDB has adapted transaction feature (ACID Properties) in MongoDB 4.0. This talk focuses on the internals of how MongoDB adapted the ACID properties with Weird Tiger Engine. Weird tiger offers more future possibilities for MongoDB. This tech talk was presented at Mydbops Database Meetup on 27-04-2019 by Manosh Malai Senior Devops/NoSQL Consultant with Mydbops and Ranjith Database Administrator with Mydbops.
This presentation describes how to configure and leverage ProxySQL with
AWS Aurora,
Azure Database for MySQL
and CloudSQL for MySQL.
It details the various benefits, configuration, and monitoring.
Spark 의 핵심은 무엇인가? RDD! (RDD paper review)Yongho Ha
요즘 Hadoop 보다 더 뜨고 있는 Spark.
그 Spark의 핵심을 이해하기 위해서는 핵심 자료구조인 Resilient Distributed Datasets (RDD)를 이해하는 것이 필요합니다.
RDD가 어떻게 동작하는지, 원 논문을 리뷰하며 살펴보도록 합시다.
http://www.cs.berkeley.edu/~matei/papers/2012/sigmod_shark_demo.pdf
Adventures with the ClickHouse ReplacingMergeTree EngineAltinity Ltd
Presentation on ReplacingMergeTree by Robert Hodges of Altinity at the 14 December 2022 SF Bay Area ClickHouse Meetup (https://www.meetup.com/san-francisco-bay-area-clickhouse-meetup/events/289605843/)
How Discord Migrated Trillions of Messages from Cassandra to ScyllaDBScyllaDB
Learn why and how Discord’s persistence team recently completed their most ambitious migration yet: moving their massive set of trillions of messages from Cassandra to ScyllaDB. Bo Ingram, Senior Software Engineer at Discord, provides a technical look, including:
- Their reasons for moving from Apache Cassandra to ScyllaDB
- Their strategy for migrating trillions of messages
- How they designed a new storage topology – using a hybrid-RAID1 architecture – for extremely low latency on GCP
- The role of their existing Rust messages service, new Rust data service library, and new Rust data migrator in this project
- What they’ve achieved so far, lessons learned, and what they’re tackling next
All about Zookeeper and ClickHouse Keeper.pdfAltinity Ltd
ClickHouse clusters depend on ZooKeeper to handle replication and distributed DDL commands. In this Altinity webinar, we’ll explain why ZooKeeper is necessary, how it works, and introduce the new built-in replacement named ClickHouse Keeper. You’ll learn practical tips to care for ZooKeeper in sickness and health. You’ll also learn how/when to use ClickHouse Keeper. We will share our recommendations for keeping that happy as well.
API analytics with Redis and Google Bigquery. NoSQL matters editionjavier ramirez
At teowaki we have a system for API use analytics using Redis as a fast intermediate store and bigquery as a big data backend. As a result, we can launch aggregated queries on our traffic/usage data in a few seconds and we can try and find for usage patterns that wouldn’t be obvious otherwise. In this session I will speak of the alternatives we evaluated and how we are using Redis and Bigquery to solve our problem.
From Hadoop, through HIVE, Spark, YARN, searching for the holy grail for low latency SQL dialect like big data query. I talk about OLTP and OLAP, row stores and column stores, and finally arrive to BigQuery. Demo on the web console.
Get more from Analytics with Google BigQuery - Javier Ramirez - Datawaki- BBVACIjavier ramirez
Talk about the integration of Google Analytics and BigQuery, delivered at Dare2Data event (BBVACI). The video is available at https://www.youtube.com/watch?v=ZdMJf0btAbc
My Talk at GCPUG-Taiwan on 2015/5/8.
You use BigQuery with SQL, but the internal work of BigQuery is very different from traditional Relational Database systems you may familiar with.
One of the way to understand how BigQuery works is to see it from the cost you pay for BigQuery. Knowing how to save money while using BigQuery is to know how BigQuery works to some extent.
In this session, let’s talk about practical knowledge (saving money) and exciting technology (how BigQuery works)!
Crunching Data with Google BigQuery. JORDAN TIGANI at Big Data Spain 2012Big Data Spain
Session presented at Big Data Spain 2012 Conference
16th Nov 2012
ETSI Telecomunicacion UPM Madrid
www.bigdataspain.org
More info: http://www.bigdataspain.org/es-2012/conference/crunching-data-with-google-bigquery/jordan-tigani
BigQuery is Google's columnar, massively parallel data querying solution. This talk explores using it as an ad-hoc reporting solution and the limitations present in May 2013.
Based out of Bangalore, Thoughtapult was incorporated in July 2013 with a highly qualified team averaging more than 25+ years of experience across different industry verticals and leading Organizations through all stages of a business. With expertise in business strategy, sales, marketing, technology, project management, process & operations, monetizing new business and finance, our team comes with the vision of Fostering Innovation, Excellence & Growth towards the Realization of your Dreams
About
A Professional Mentoring, Advisory, Business Consulting & Investment Banking Firm focused on providing mentoring and management solutions to startups and young entrepreneurs across industries.
Mission
“Ignite & Mentor Young Business Minds to build an idea from its inception or journey, nurture and grow it, to a size that would have impact in the space that it was created to be in. Creating Value for their customer eco systems and Wealth for their stake holders”
Vision “Ignite – Emerge – Transform - Excel”
MindSphere differentiates itself as a company that doesn’t just provide solutions to its clients’ immediate problems but believes in engaging with the client and enabling it as a self-sustaining entity. We partner with our clients in all its meaning of partnership and define, redefine and fine-tune a business into one that is capable of finding solutions to all its challenges of the future. This happens when we work with you towards gaining fresh perspective to the challenges and also finding an insight of all your capabilities and hence, creating a system that learns and thrives in all situations, with or without a mentor
Founded by Sanjay Prasad a Serial Entrepreneur with 27+ rich years of experience and Four successful ventures from inception to successful acquisition to his credit
His last venture MindRiver which was in the IT Services space grew from a 4 People startup to a 400+ Strong enterprise that was acquired by Acropetal within 7 years of its inception and grew share holder wealth by 23 times.
Specialties
Mentoring for Startup's & Young Entrepreneurs, Management Consulting, Business Consulting, Strategic Consulting, Advisory & Investment Banking
RoIT Consulting Company Services PresentationRoIT Consulting
This presentation provides overview of services provided by RoIT Consulting, management consulting company specializing in advisory in change management, business development and finance, as well as development of solutions in information technology (including ERP systems) and business intelligence.
GDD Brazil 2010 - Google Storage, Bigquery and Prediction APIsPatrick Chanezon
Google is expanding our storage products by introducing Google Storage for Developers. It offers a RESTful API for storing and accessing data at Google. Developers can take advantage of the performance and reliability of Google's storage infrastructure, as well as the advanced security and sharing capabilities. We will demonstrate key functionality of the product as well as customer use cases. Google relies heavily on data analysis and has developed many tools to understand large datasets. Two of these tools are now available on a limited sign-up basis to developers: (1) BigQuery: interactive analysis of very large data sets and (2) Prediction API: make informed predictions from your data. We will demonstrate their use and give instructions on how to get access.
The number of startup ecosystems worldwide is currently booming. There are many opportunities but competition is extremely intense. Here is a guide to successful startups and how the right accelerator can boost you ahead of the rest.
Big Data Analytics with Google BigQuery, by Javier Ramirez, datawaki, at Span...javier ramirez
In this talk I explain why we decided to use BigQuery as our analytics backend in datawaki.com and teowaki.com
I describe the architecture of BigQuery, what you can do with it, and how other people are using it
Talk delivered at span conference (spanconf.io) in London, 2014
Bigdata for small pockets, by Javier Ramirez from teowaki. RubyC Kiev 2014javier ramirez
This is the story of how https://teowaki.com added bigdata analytics using a very economic approach.
Bigdata is amazing. You can get insights from your users, find interesting patterns and have lots of geek fun. Problem is big data usually means many servers, a complex set up, intensive monitoring and a steep learning curve. All those things cost money. If you don't have the money, you are losing all the fun. In my talk I will show you how you can use Redis, Google Bigquery and Apps Script to manage big data from your application for under $1 per month. Don't you feel like running a RegExp over 300 million rows in just 5 seconds?
At teowaki we have a system for API usage analytics, with Redis as a fast intermediate store and bigquery as a big data backend. As a result, we can launch aggregated queries on our traffic/usage data in just a few seconds and we can try and find for usage patterns that wouldn’t be obvious otherwise.
In this session I will talk about how we entered the Big Data world, which alternatives we evaluated, and how we are using Redis and Bigquery to solve our problem.
API Analytics with Redis and Bigquery. NoSQLmatters Cologne '14 edition. Javi...javier ramirez
At teowaki we have a system for API usage analytics, with Redis as a fast intermediate store and bigquery as a big data backend. As a result, we can launch aggregated queries on our traffic/usage data in just a few seconds and we can try and find for usage patterns that wouldn’t be obvious otherwise.In this session I will talk about how we entered the Big Data world, which alternatives we evaluated, and how we are using Redis and Bigquery to solve our problem.
Big Data with BigQuery, presented at DevoxxUK 2014 by Javier Ramirez from teo...javier ramirez
Big data is amazing. You can get insights from your users, find interesting patterns and have lots of geek fun. Problem is big data usually means many servers, a complex set up, intensive monitoring and a steep learning curve. All those things cost money. If you don’t have the money, you are losing all the fun.
In my talk I show you how you can use Google BigQuery to manage big data from your application using a hosted solution. And you can start with less than $1 per month.
Link to the full talk - https://youtu.be/2Rf5t2Eh6IQ
https://go.dok.community/slack
https://dok.community
ABSTRACT OF THE TALK
This talk will provide a high-level overview of Kubernetes, Helm charts and how they can be used to deploy Apache Druid clusters of any size.
We'll review how Kubernetes functionality enables resilience and self-healing, historical tiers through node group affinity, middle manager scaling through Kubernetes autoscaling to optimize ingestion capacity and some of the gotchas along the way.
BIO
Sergio Ferragut is a database veteran turned Developer Advocate at Imply. His experience includes 16 years at Teradata in professional services and engineering roles.
He has direct experience in building analytics applications spanning the retail, supply chain, pricing optimization and IoT spaces.
Sergio has worked at multiple technology start-ups including APL and Splice Machine where he helped guide product design and field messaging.
Fun with ruby and redis, arrrrcamp edition, javier_ramirez, teowakijavier ramirez
In this talk I make an introduction to Redis, then I explain how some big names (twitter, pinterest...) are using it, then I describe some pitfalls, then I explain how we are using redis at teowaki
Por que deberias haberle pedido redis a los reyes magosjavier ramirez
Charla sobre Redis impartida en el grupo de Ruby Zaragoza explicando qué es Redis, para qué se usa en sitios como Twitter o Pinterest y cómo se está usando en teowaki
Apresentado na React Conf Brasil, em São Paulo, 7 de Outubro de 2017 #reactconfbr
Programador Nutella, cheguei na WEB quando HTML era 5, mas React era mato, escrevo coisas em Javascript para resolver problemas que você provavelmente nem sabe que tem e em outras linguagens para resolver problemas que eu tenho.
https://github.com/mtmr0x
@mtmr0x
- Patrocínio: Pipefy, Globo.com, Meteor, Apollo, Taller, Fullcircle, Quanto, Udacity, Cubos, Segware, Entria
- Apoio: Concrete, Rung, LuizaLabs, Movile, Rivendel, GreenMile, STQ, Hi Platform
- Promoção: InfoQ, DevNaEstrada, CodamosClub, JS Ladies, NodeBR, Training Center, BrazilJS, Tableless, GeekHunter
- Afterparty: An English Thing
Similar to Big Data Analytics with Google BigQuery. By Javier Ramirez. All your base Conference 2014 (20)
Hubo un tiempo en el que casi cualquier componente de software requería pagar una licencia. Afortunadamente, hoy en día gracias al software libre y de código abierto, se puede desarrollar prácticamente cualquier aplicación usando componentes gratuitos.
Pero, si el software es gratis, ¿Quién lo desarrolla? ¿Trabaja la comunidad de software libre de forma altruista? ¿Se puede desarrollar software libre de forma profesional? De hecho, hay quien dice que el código abierto tal y como lo conocimos ya no existe, y que lo que hay hoy en día es otra cosa.
En esta charla hablaré de cómo se puede monetizar el código libre, y de algunos posibles conflictos que puedes encontrarte en el camino.
Además, te contaré cómo hacemos desde QuestDB para desarrollar una base de datos de código abierto y mantener un equipo estable viviendo de ello. Comentaré también algunas situaciones problemáticas a las que proyectos muy destacados se han enfrentado, o que se enfrentan a día de hoy.
QuestDB: The building blocks of a fast open-source time-series databasejavier ramirez
(talk delivered at OSA CON 23)
Traditionally, databases have treated timestamps just as another data type. However, when performing real-time analytics, timestamps should be first class citizens and we need rich time semantics to get the most out of our data. We also need to deal with ever growing datasets while keeping performant, which is as fun as it sounds.
It is no wonder time-series databases are now more popular than ever before. Join me in this session to learn about the internal architecture and building blocks of QuestDB, an open source time-series database designed for speed.
We will learn how it deals with data ingestion, and which SQL extensions it implements for working with time-series efficiently.
We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or data deduplication.
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...javier ramirez
QuestDB es una base de datos open source de alto rendimiento. Mucha gente nos comentaba que les gustaría usarla como servicio, sin tener que gestionar las máquinas. Así que nos pusimos manos a la obra para desarrollar una solución que nos permitiese lanzar instancias de QuestDB con provisionado, monitorización, seguridad o actualizaciones totalmente gestionadas.
Unos cuantos clusters de Kubernetes más tarde, conseguimos lanzar nuestra oferta de QuestDB Cloud. Esta charla es la historia de cómo llegamos ahí. Hablaré de herramientas como Calico, Karpenter, CoreDNS, Telegraf, Prometheus, Loki o Grafana, pero también de retos como autenticación, facturación, multi-nube, o de a qué tienes que decir que no para poder sobrevivir en la nube.
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...javier ramirez
How would you build a database to support sustained ingestion of several hundreds of thousands rows per second while running near real-time queries on top?
In this session I will go over some of the technical decisions and trade-offs we applied when building QuestDB, an open source time-series database developed mainly in JAVA, and how we can achieve over four million row writes per second on a single instance without blocking or slowing down the reads. There will be code and demos, of course.
We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or faster batch ingestion.
Deduplicating and analysing time-series data with Apache Beam and QuestDBjavier ramirez
Time series data pipelines tend to prioritise speed and freshness over completeness and integrity. In such scenarios, it is very common to ingest duplicate data, which may be fine for many analytical use cases, but is very inconvenient for others.
There are many open source databases built specifically for the speed and query semantics of time series, and most of them lack automatic deduplication of events in near real-time. One such database is QuestDB, which requires a manual batch process to deduplicate ingested data.
In this talk, we will see how we can successfully use Apache Beam to deduplicate streaming time series, which can then be analysed by a time series database.
Relational databases were created a long time ago for a simpler world. Even if they are still awesome tools for generic workloads, there are some things they cannot do well.
In this session I will speak about purpose-built databases that you can use for specific business scenarios. We will see the type of queries you can run on a Graph database, a Document Database, and a Time-Series database. We will then see how a relational database could also be used for the same use cases, just in a much more complex way.
Your Timestamps Deserve Better than a Generic Databasejavier ramirez
If you are storing records with a timestamp in your database, it is very likely a time series database can make your life easier.
However, time series databases are still the great unknown for a large part of the tech community.
In this talk, I will show you what use cases they are good for, what they give you that you cannot get from a traditional database, and when it is a good idea (and when it is not) to use them.
For the demos, we will be using QuestDB, the fastest open-source time series database.
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...javier ramirez
En esta sesión voy a contar las decisiones técnicas que tomamos al desarrollar QuestDB, una base de datos Open Source para series temporales compatible con Postgres, y cómo conseguimos escribir más de cuatro millones de filas por segundo sin bloquear o enlentecer las consultas.
Hablaré de cosas como (zero) Garbage Collection, vectorización de instrucciones usando SIMD, reescribir en lugar de reutilizar para arañar microsegundos, aprovecharse de los avances en procesadores, discos duros y sistemas operativos, como por ejemplo el soporte de io_uring, o del balance entre experiencia de usuario y rendimiento cuando se plantean nuevas funcionalidades.
Processing and analysing streaming data with Python. Pycon Italy 2022javier ramirez
Data used to be a batch thing, but more and more we get unbounded streams of data, fast or slow, that we need to process and analyse in near real time.
In this talk I’ll show you how you can use Apache Flink and QuestDB to build reliable streaming data pipelines that can grow as much as you need.
QuestDB: ingesting a million time series per second on a single instance. Big...javier ramirez
In this session I will show you the technical decisions we made when building QuestDB, the open source, Postgres compatible, time-series database, and how we can achieve a million row writes per second without blocking or slowing down the reads.
Servicios e infraestructura de AWS y la próxima región en Aragónjavier ramirez
AWS está montando una región de infraestructura en Aragón. Vale, pero ¿Qué significa eso? ¿Es tan diferente de un centro de datos convencional o de otros proveedores de nube? (Spoiler: Sí). En esta sesión te cuento por qué. Hay video en https://catedrasamcadt.unizar.es/noticias/el-momento-tecnologico-actual-contado-por-trabajadores-de-amazon-web-services/
¿Qué es eso del desarrollo sin servidores? ¿Qué lenguajes puedo utilizar? ¿Cómo hago cosas como autenticación, o guardar en base de datos, o enviar notificaciones? ¿Esto escala? A todas estas preguntas, y a alguna más, intentaré dar respuesta en esta sesión, donde haré una pequeña demo de montar una app muy sencilla y desplegarla en la nube sin preocuparnos de gestionar infraestructura. Charla realizada por primera vez para AlcarriaConf 2021
AWS launched publicly on March 2006 with just one service, starting the age of the public cloud. You might think after 15 years everything in cloud has already been invented, but that's simply not the case.
In this session I want to show you how AWS is reinventing the cloud in areas like computing, machine learning, databases and analytics, or cloud infrastructure.
Analitica de datos en tiempo real con Apache Flink y Apache BEAMjavier ramirez
Trabajar en tiempo real con datos que se mueven muy rápido no es trivial, sobre todo con volúmenes de datos elevados. Apache Flink y Apache BEAM están específicamente diseñadas para ese caso de uso. En esta charla te contaré los retos de la analítica en tiempo real, cuál es la arquitectura de Apache Flink, qué es Apace BEAM, y cómo usan estas herramientas empresas para hacer desde procesos triviales hasta gestionar billones de eventos al día con latencias de milisegundos. Por supuesto, haremos una demo :)
In this webinar we explain which are some of the problems of streaming analytics, and why they are different to batch/big data analytics. Then we go into introducing some basic streaming concepts, like event queues, event processors, event vs processing time, and delivery guarantees. We end this first part of the series presenting a few of the most common open source components for streaming (Kafka, Spark, Flink, Cassandra, or ElasticSearch) and we mention the different options you have to run them on AWS.
Getting started with streaming analytics: Setting up a pipelinejavier ramirez
In this session I will show you how to create a simple streaming analytics pipeline, first using open source tools and developing locally, then moving to a VM, then moving to fully managed AWS services. The session will serve as an introduction to some details of Apache Kafka, Apache Flink, ElasticSearch, Amazon Managed Streaming for Kafka, Kinesis Data Analytics, and Amazon ElasticSearch. It will be an almost slideless presentation, as I will spent most of the time at the command line and the IDE.
Getting started with streaming analytics: Deep Divejavier ramirez
Now that we know how to create simple streaming analytics pipelines, it is time to learn something more interesting. In this session I will show you how to add Complex Event Processing to your Apache Flink (or Kinesis Data Analytics) application using JAVA. For those of you that prefer SQL, I will show you how to run streaming analytics using only SQL.
Getting started with streaming analytics: streaming basics (1 of 3)javier ramirez
In this webinar we explain which are some of the problems of streaming analytics, and why they are different to batch/big data analytics. Then we go into introducing some basic streaming concepts, like event queues, event processors, event vs processing time, and delivery guarantees. We end this first part of the series presenting a few of the most common open source components for streaming (Kafka, Spark, Flink, Cassandra, or ElasticSearch) and we mention the different options you have to run them on AWS.
Monitorización de seguridad y detección de amenazas con AWSjavier ramirez
La seguridad es nuestra prioridad número uno. Cuando despliegas tu infraestructura y aplicaciones en la nube, hay que tener en cuenta que muchas de las prácticas de seguridad son iguales a las que se llevan a cabo tradicionalmente cuando trabajas on-premises, pero hay otros mecanismos que son específicos a AWS y que te ayudan a operar de forma segura.
En este webinar, vamos a explicarte las bases de la monitorización de seguridad y de la detección de amenazas, y veremos cómo servicios como Amazon GuardDuty y AWS Security Hub te ayudan a tener una visión completa, te permiten cumplir con tus requisitos de compliance, y te permiten detectar amenazas en tus cargas de trabajo.
We describe the deployment and use of Globus Compute for remote computation. This content is aimed at researchers who wish to compute on remote resources using a unified programming interface, as well as system administrators who will deploy and operate Globus Compute services on their research computing infrastructure.
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTier1 app
Even though at surface level ‘java.lang.OutOfMemoryError’ appears as one single error; underlyingly there are 9 types of OutOfMemoryError. Each type of OutOfMemoryError has different causes, diagnosis approaches and solutions. This session equips you with the knowledge, tools, and techniques needed to troubleshoot and conquer OutOfMemoryError in all its forms, ensuring smoother, more efficient Java applications.
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus
As part of the DOE Integrated Research Infrastructure (IRI) program, NERSC at Lawrence Berkeley National Lab and ALCF at Argonne National Lab are working closely with General Atomics on accelerating the computing requirements of the DIII-D experiment. As part of the work the team is investigating ways to speedup the time to solution for many different parts of the DIII-D workflow including how they run jobs on HPC systems. One of these routes is looking at Globus Compute as a way to replace the current method for managing tasks and we describe a brief proof of concept showing how Globus Compute could help to schedule jobs and be a tool to connect compute at different facilities.
Multiple Your Crypto Portfolio with the Innovative Features of Advanced Crypt...Hivelance Technology
Cryptocurrency trading bots are computer programs designed to automate buying, selling, and managing cryptocurrency transactions. These bots utilize advanced algorithms and machine learning techniques to analyze market data, identify trading opportunities, and execute trades on behalf of their users. By automating the decision-making process, crypto trading bots can react to market changes faster than human traders
Hivelance, a leading provider of cryptocurrency trading bot development services, stands out as the premier choice for crypto traders and developers. Hivelance boasts a team of seasoned cryptocurrency experts and software engineers who deeply understand the crypto market and the latest trends in automated trading, Hivelance leverages the latest technologies and tools in the industry, including advanced AI and machine learning algorithms, to create highly efficient and adaptable crypto trading bots
First Steps with Globus Compute Multi-User EndpointsGlobus
In this presentation we will share our experiences around getting started with the Globus Compute multi-user endpoint. Working with the Pharmacology group at the University of Auckland, we have previously written an application using Globus Compute that can offload computationally expensive steps in the researcher's workflows, which they wish to manage from their familiar Windows environments, onto the NeSI (New Zealand eScience Infrastructure) cluster. Some of the challenges we have encountered were that each researcher had to set up and manage their own single-user globus compute endpoint and that the workloads had varying resource requirements (CPUs, memory and wall time) between different runs. We hope that the multi-user endpoint will help to address these challenges and share an update on our progress here.
Why React Native as a Strategic Advantage for Startup Innovation.pdfayushiqss
Do you know that React Native is being increasingly adopted by startups as well as big companies in the mobile app development industry? Big names like Facebook, Instagram, and Pinterest have already integrated this robust open-source framework.
In fact, according to a report by Statista, the number of React Native developers has been steadily increasing over the years, reaching an estimated 1.9 million by the end of 2024. This means that the demand for this framework in the job market has been growing making it a valuable skill.
But what makes React Native so popular for mobile application development? It offers excellent cross-platform capabilities among other benefits. This way, with React Native, developers can write code once and run it on both iOS and Android devices thus saving time and resources leading to shorter development cycles hence faster time-to-market for your app.
Let’s take the example of a startup, which wanted to release their app on both iOS and Android at once. Through the use of React Native they managed to create an app and bring it into the market within a very short period. This helped them gain an advantage over their competitors because they had access to a large user base who were able to generate revenue quickly for them.
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Shahin Sheidaei
Games are powerful teaching tools, fostering hands-on engagement and fun. But they require careful consideration to succeed. Join me to explore factors in running and selecting games, ensuring they serve as effective teaching tools. Learn to maintain focus on learning objectives while playing, and how to measure the ROI of gaming in education. Discover strategies for pitching gaming to leadership. This session offers insights, tips, and examples for coaches, team leads, and enterprise leaders seeking to teach from simple to complex concepts.
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisGlobus
JASMIN is the UK’s high-performance data analysis platform for environmental science, operated by STFC on behalf of the UK Natural Environment Research Council (NERC). In addition to its role in hosting the CEDA Archive (NERC’s long-term repository for climate, atmospheric science & Earth observation data in the UK), JASMIN provides a collaborative platform to a community of around 2,000 scientists in the UK and beyond, providing nearly 400 environmental science projects with working space, compute resources and tools to facilitate their work. High-performance data transfer into and out of JASMIN has always been a key feature, with many scientists bringing model outputs from supercomputers elsewhere in the UK, to analyse against observational or other model data in the CEDA Archive. A growing number of JASMIN users are now realising the benefits of using the Globus service to provide reliable and efficient data movement and other tasks in this and other contexts. Further use cases involve long-distance (intercontinental) transfers to and from JASMIN, and collecting results from a mobile atmospheric radar system, pushing data to JASMIN via a lightweight Globus deployment. We provide details of how Globus fits into our current infrastructure, our experience of the recent migration to GCSv5.4, and of our interest in developing use of the wider ecosystem of Globus services for the benefit of our user community.
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Globus
Large Language Models (LLMs) are currently the center of attention in the tech world, particularly for their potential to advance research. In this presentation, we'll explore a straightforward and effective method for quickly initiating inference runs on supercomputers using the vLLM tool with Globus Compute, specifically on the Polaris system at ALCF. We'll begin by briefly discussing the popularity and applications of LLMs in various fields. Following this, we will introduce the vLLM tool, and explain how it integrates with Globus Compute to efficiently manage LLM operations on Polaris. Attendees will learn the practical aspects of setting up and remotely triggering LLMs from local machines, focusing on ease of use and efficiency. This talk is ideal for researchers and practitioners looking to leverage the power of LLMs in their work, offering a clear guide to harnessing supercomputing resources for quick and effective LLM inference.
Large Language Models and the End of ProgrammingMatt Welsh
Talk by Matt Welsh at Craft Conference 2024 on the impact that Large Language Models will have on the future of software development. In this talk, I discuss the ways in which LLMs will impact the software industry, from replacing human software developers with AI, to replacing conventional software with models that perform reasoning, computation, and problem-solving.
Modern design is crucial in today's digital environment, and this is especially true for SharePoint intranets. The design of these digital hubs is critical to user engagement and productivity enhancement. They are the cornerstone of internal collaboration and interaction within enterprises.
Code reviews are vital for ensuring good code quality. They serve as one of our last lines of defense against bugs and subpar code reaching production.
Yet, they often turn into annoying tasks riddled with frustration, hostility, unclear feedback and lack of standards. How can we improve this crucial process?
In this session we will cover:
- The Art of Effective Code Reviews
- Streamlining the Review Process
- Elevating Reviews with Automated Tools
By the end of this presentation, you'll have the knowledge on how to organize and improve your code review proces
Your Digital Assistant.
Making complex approach simple. Straightforward process saves time. No more waiting to connect with people that matter to you. Safety first is not a cliché - Securely protect information in cloud storage to prevent any third party from accessing data.
Would you rather make your visitors feel burdened by making them wait? Or choose VizMan for a stress-free experience? VizMan is an automated visitor management system that works for any industries not limited to factories, societies, government institutes, and warehouses. A new age contactless way of logging information of visitors, employees, packages, and vehicles. VizMan is a digital logbook so it deters unnecessary use of paper or space since there is no requirement of bundles of registers that is left to collect dust in a corner of a room. Visitor’s essential details, helps in scheduling meetings for visitors and employees, and assists in supervising the attendance of the employees. With VizMan, visitors don’t need to wait for hours in long queues. VizMan handles visitors with the value they deserve because we know time is important to you.
Feasible Features
One Subscription, Four Modules – Admin, Employee, Receptionist, and Gatekeeper ensures confidentiality and prevents data from being manipulated
User Friendly – can be easily used on Android, iOS, and Web Interface
Multiple Accessibility – Log in through any device from any place at any time
One app for all industries – a Visitor Management System that works for any organisation.
Stress-free Sign-up
Visitor is registered and checked-in by the Receptionist
Host gets a notification, where they opt to Approve the meeting
Host notifies the Receptionist of the end of the meeting
Visitor is checked-out by the Receptionist
Host enters notes and remarks of the meeting
Customizable Components
Scheduling Meetings – Host can invite visitors for meetings and also approve, reject and reschedule meetings
Single/Bulk invites – Invitations can be sent individually to a visitor or collectively to many visitors
VIP Visitors – Additional security of data for VIP visitors to avoid misuse of information
Courier Management – Keeps a check on deliveries like commodities being delivered in and out of establishments
Alerts & Notifications – Get notified on SMS, email, and application
Parking Management – Manage availability of parking space
Individual log-in – Every user has their own log-in id
Visitor/Meeting Analytics – Evaluate notes and remarks of the meeting stored in the system
Visitor Management System is a secure and user friendly database manager that records, filters, tracks the visitors to your organization.
"Secure Your Premises with VizMan (VMS) – Get It Now"
Accelerate Enterprise Software Engineering with PlatformlessWSO2
Key takeaways:
Challenges of building platforms and the benefits of platformless.
Key principles of platformless, including API-first, cloud-native middleware, platform engineering, and developer experience.
How Choreo enables the platformless experience.
How key concepts like application architecture, domain-driven design, zero trust, and cell-based architecture are inherently a part of Choreo.
Demo of an end-to-end app built and deployed on Choreo.
Unleash Unlimited Potential with One-Time Purchase
BoxLang is more than just a language; it's a community. By choosing a Visionary License, you're not just investing in your success, you're actively contributing to the ongoing development and support of BoxLang.
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...Juraj Vysvader
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I didn't get rich from it but it did have 63K downloads (powered possible tens of thousands of websites).
top nidhi software solution freedownloadvrstrong314
This presentation emphasizes the importance of data security and legal compliance for Nidhi companies in India. It highlights how online Nidhi software solutions, like Vector Nidhi Software, offer advanced features tailored to these needs. Key aspects include encryption, access controls, and audit trails to ensure data security. The software complies with regulatory guidelines from the MCA and RBI and adheres to Nidhi Rules, 2014. With customizable, user-friendly interfaces and real-time features, these Nidhi software solutions enhance efficiency, support growth, and provide exceptional member services. The presentation concludes with contact information for further inquiries.
4. bigdata is doing a fullscan
to 330MM rows, matching
them against a regexp, and
getting the result (223MM
rows) in just 5 seconds
javier ramirez @supercoco9 https://teowaki.com
Javier Ramirez
impresionable teowaki founder
6. bigdata is cool but...
expensive cluster
hard to set up and monitor
not interactive enough
7. Our choice:
Google BigQuery
Data analysis as a service
http://developers.google.com/bigquery
javier ramirez @supercoco9 https://teowaki.com
8. Based on Dremel
Specifically designed for
interactive queries over
petabytes of real-time data
javier ramirez @supercoco9 https://teowaki.com
9. What Dremel is used for in Google
• Analysis of crawled web documents.
• Tracking install data for applications on Android Market.
• Crash reporting for Google products.
• OCR results from Google Books.
• Spam analysis.
• Debugging of map tiles on Google Maps.
• Tablet migrations in managed Bigtable instances.
• Results of tests run on Google’s distributed build system.
• Disk I/O statistics for hundreds of thousands of disks.
• Resource monitoring for jobs run in Google’s data centers.
• Symbols and dependencies in Google’s codebase.
10. in BigQuery
everything is
a full-scan*
*Over a ridiculously fast distributed filesystem.
Dremel design goal: 1TB/sec. It was exceeded
BigQuery delivers ~ 50Gb/Sec.
javier ramirez @supercoco9 https://teowaki.com
17. Things you always wanted to
try but were too scared to
select count(*) from
publicdata:samples.wikipedia
where REGEXP_MATCH(title, "[0-9]*")
AND wp_namespace = 0;
223,163,387
Query complete (5.6s elapsed, 9.13 GB processed, Cost: 32¢)
javier ramirez @supercoco9 https://teowaki.com
18.
19. Global Database of Events,
Language and Tone
quarter billion rows
30 years
updated daily
http://gdeltproject.org/data.html#googlebigquery
20. SELECT Year, Actor1Name, Actor2Name, Count FROM (
SELECT Actor1Name, Actor2Name, Year,
COUNT(*) Count, RANK() OVER(PARTITION BY YEAR ORDER BY
Count DESC) rank
FROM
(SELECT Actor1Name, Actor2Name, Year FROM
[gdelt-bq:full.events] WHERE Actor1Name < Actor2Name
and Actor1CountryCode != '' and Actor2CountryCode != ''
and Actor1CountryCode!=Actor2CountryCode),
(SELECT Actor2Name Actor1Name, Actor1Name Actor2Name,
Year FROM [gdelt-bq:full.events] WHERE
Actor1Name > Actor2Name and Actor1CountryCode != '' and
Actor2CountryCode != '' and
Actor1CountryCode!=Actor2CountryCode),
WHERE Actor1Name IS NOT null
AND Actor2Name IS NOT null
GROUP EACH BY 1, 2, 3
HAVING Count > 100
)
WHERE rank=1
ORDER BY Year
23. Automation with Apps Script
Read from bigquery
Create a spreadsheet on Drive
E-mail it everyday as a PDF
javier ramirez @supercoco9 https://teowaki.com
28. Analysing weather information
Finding patterns in e-commerce
Match online/offline behaviour
Log analysys
Analysing inventory/booking data
...
29. bigquery pricing
$80 per stored TB
1000000 rows => $0.02288 / month
$35 per processed TB
1 full scan ~ 240 MB
1 count = 0 MB
1 full scan over 1 column ~ 13 MB
10 GB => $0.35 / month
*the 1st TB processed every month is free of charge
javier ramirez @supercoco9 https://teowaki.com
30. Find related links at
https://teowaki.com/teams/javier-community/link-categories/bigquery-talk
Thanks
Javier Ramírez
@supercoco9
Editor's Notes
nadie duda de que tu api sea técnicamente muy buena, pero...
conclusión obvia
esto va a ser un problema de big data
el problema es que nosotros no sabíamos de big data. Nos sonaba map/reduce, hadoop, cassandra.. pero nos faltaban datos
esto hace dos años era imposible. vivimos en el futuro
para poder ejecutar consultas sobre nuestros datos, el primer punto es extraerlos de nuestro sistema, que en nuestro caso significa extraer la información de las peticiones del usuario conforme van pasando
Apache Drill es el equivalente en open source. No funciona como servicio.
bigquery es un recubrimiento REST encima de dremel. Usable desde cualquier plataforma que permita REST. Apis disponibles para diferentes lenguajes
Solamente para inserciones! no borrados o updates.A menudo junto Map/reduce o hadoop. Análisis in place, sin carga previa, sin índices ni planificar las queries de antemano
next: full scan regexp
Column data is of uniform type; therefore, there are some opportunities for storage size optimizations available in column-oriented data that are not available in row-oriented data.
also less I/O
Además Dremel proporciona una estructura en árbol para lanzar las queries
batch y tiempo real tanto en la entrada de datos (ficheros o stream) como en la salida (interactivo o batch)
pagas por lo que usas
batch y tiempo real tanto en la entrada de datos (ficheros o stream) como en la salida (interactivo o batch)
pagas por lo que usas
la carga puede ser de fichero plano (tsv para evitar problemas de comillas) o con json si necesitas estructura. Importar desde consola web, REST o command line
se pueden importar ficheros comprimidos
también se puede importar información para tiempo real en modo stream
NEXT: CONCEPTOS DE BIGQUERY
web console
api rest
command line
Notice the validate button to avoid expenses
next: full scan regexp
total
313,797,035
global database of events, language
and tone
quarter billion rows
30 years
updated daily
Column data is of uniform type; therefore, there are some opportunities for storage size optimizations available in column-oriented data that are not available in row-oriented data.
also less I/O
Además Dremel proporciona una estructura en árbol para lanzar las queries
Column data is of uniform type; therefore, there are some opportunities for storage size optimizations available in column-oriented data that are not available in row-oriented data.
also less I/O
Además Dremel proporciona una estructura en árbol para lanzar las queries