This document discusses Google BigQuery, a tool for analyzing large datasets that is fast, easy to use, and cost effective. It provides SQL-like queries against nested and columnar data stored in Google's infrastructure. Developers can access BigQuery through Google Cloud Storage, a REST API, or command line tools. BigQuery handles the infrastructure maintenance and offers on-demand or reserved pricing models.
In this webinar you'll learn about the best practices for Google BigQuery—and how Matillion ETL makes loading your data faster and easier. Find out from our experts how to leverage one of the largest, fastest, and most capable cloud data warehouses to improve your business and save money.
In this webinar:
- Discover how to work fast and efficiently with Google BigQuery
- Find out the best ways to monitor and control costs
- Learn to leverage Matillion ETL and optimize Google BigQuery
- Get tips and tricks for better performance
Quick Intro to Google Cloud TechnologiesChris Schalk
This is the "Lightning Presentation" given at DreamForce 2011 on Google's Cloud Technologies. It covers, App Engine, Google Storage and BigQuery. #df11
Introduction to Google BigQuery. Slides used at the first GDG Cloud meetup in Brussels, about big data on Google Cloud Platform. (http://www.meetup.com/GDG-Cloud-Belgium/events/228206131)
Basic concepts, best practices, pricing of using BigQuery the analytic data platform at petabyte scale from Google Cloud Platform. There is a lot things to learn about this tool and its features such as BI engine and AI Platform.
Big Data with hadoop, Spark and BigQuery (Google cloud next Extended 2017 Kar...Imam Raza
Google Next Extended (https://cloudnext.withgoogle.com/) is an annual Google event focusing on Google cloud technologies. This presentation is from tech talk held in Google Next Extended 2017 Karachi event
The 'macro view' on Big Query:
We started with an overview, some typical uses and moved to project hierarchy, access control and security.
In the end we touch about tools and demos.
In this webinar you'll learn about the best practices for Google BigQuery—and how Matillion ETL makes loading your data faster and easier. Find out from our experts how to leverage one of the largest, fastest, and most capable cloud data warehouses to improve your business and save money.
In this webinar:
- Discover how to work fast and efficiently with Google BigQuery
- Find out the best ways to monitor and control costs
- Learn to leverage Matillion ETL and optimize Google BigQuery
- Get tips and tricks for better performance
Quick Intro to Google Cloud TechnologiesChris Schalk
This is the "Lightning Presentation" given at DreamForce 2011 on Google's Cloud Technologies. It covers, App Engine, Google Storage and BigQuery. #df11
Introduction to Google BigQuery. Slides used at the first GDG Cloud meetup in Brussels, about big data on Google Cloud Platform. (http://www.meetup.com/GDG-Cloud-Belgium/events/228206131)
Basic concepts, best practices, pricing of using BigQuery the analytic data platform at petabyte scale from Google Cloud Platform. There is a lot things to learn about this tool and its features such as BI engine and AI Platform.
Big Data with hadoop, Spark and BigQuery (Google cloud next Extended 2017 Kar...Imam Raza
Google Next Extended (https://cloudnext.withgoogle.com/) is an annual Google event focusing on Google cloud technologies. This presentation is from tech talk held in Google Next Extended 2017 Karachi event
The 'macro view' on Big Query:
We started with an overview, some typical uses and moved to project hierarchy, access control and security.
In the end we touch about tools and demos.
Connecta Event: Big Query och dataanalys med Google Cloud PlatformConnectaDigital
Avancerad dataanalys och ”big data” har under de senaste åren klättrat på trendlistorna och är nu ett av de mest prioriterade områdena i utvecklingen av nya tjänster och produkter för ledarföretag i det digitala landskapet.
Informationen som byggs upp i systemen när kundmötena digitaliseras har visat sig vara guld värt. Här finns allt vi behöver veta för att göra våra affärer mer effektiva.
Sedan sommaren 2013 har Connecta tillsammans med Google ett etablerat samarbete för att hjälpa våra kunder med övergången till moln-tjänster för bland annat avancerad dataanalys. För att göra oss själva redo att hjälpa våra kunder har vi under ett antal år utvecklat såväl kunskaper som skaffat oss erfarenheter kring Googles olika moln-produkter, som exempelvis ”Big Query”.
Big Query är ett molnbaserat analysverktyg och en del av Google Cloud Platform. Big Query gör det möjligt att ställa snabba frågor mot enorma dataset på bara någon sekund. Big Query och Google Cloud Platform erbjuder färdiga lösningar för att sätta upp och underhålla en infrastruktur som med enkla medel gör allt detta möjligt.
På Connecta Digital Consultings tredje event för våren introducerade vi våra kunder och partners i koncepten dataanalys och Big Query.
Under eventet berördes följande punkter:
- Big Data och Business Intelligence (BI)
- “The Google Big Data tools” – framgångsfaktorer och hur man kommer igång
- Google Cloud Platform och hur man genomför en framgångsrik molnsatsning
Vi presenterade case och berättade om viktiga lärdomar vi dragit i samarbetet med Google och våra kunder.
Google BigQuery is one of the largest, fastest, and most capable cloud data warehouses on the market. In this webinar, we review BigQuery best practices and show you how Matillion ETL can help you get the most out of the platform to gain a competitive edge.
In this webinar:
- Discover how to work quickly and efficiently with Google BigQuery
- Find out the best ways to monitor and control costs
- Hear tips and tricks for loading and transforming massive amounts of data in BigQuery with Matillion ETL
- Get expert advice on improving your performance in BigQuery for quicker data analysis
- Learn how to optimize BigQuery for your marketing analytics needs
Big Query - Utilizing Google Data Warehouse for Media Analyticshafeeznazri
This topic will cover the intermediate understanding of Google Big Query and how Media Prima Digital utilizing Big Query as data warehouse for production.
Google Cloud Data Platform - Why Google for Data Analysis?Andreas Raible
Introduction to our Data Platform from capture, processing, analysis and exploration.
The Google Cloud Platform products are based on our internal systems which are powering Google AdWords, Search, YouTube and our leading research in the field of real-time data analysis.
You can get access ($300 for 60 days) to our free trial through google.com/cloud
Using Google Cloud for Marketing Analytics: How the7stars, the UK’s largest i...Matillion
the7Stars, the leading UK Digital Marketing agency, has global clients ranging from Nintendo to Suzuki to Iceland. With growing data volumes, the7Stars faced the challenge of centralizing all their customers’ marketing data for quick and easy analysis.
In this joint webinar, you will hear about how the7Stars are using Google BigQuery as their data warehouse collating data from many different sources, allowing them to grow their business and attract new customers. the7Stars is also using Matillion ETL to combine the data from different sources and load it all into BigQuery enabling agile and responsive market analysis giving their clients a competitive edge, while saving time and money.
In this webinar learn:
- the7Stars’ data journey for maximizing value
- Google BigQuery, BigQuery Data Transfer Service and best practices for marketing analytics
- How to collect data from different sources and streamline transformations and queries in Google BigQuery with Matillion ETL
- Benefits being actualized by 7 Stars, such as saving time/money and growing their customer base
Watch the full webinar: https://youtu.be/8VEHf_wAXao
Polyglot Persistence and Database Deployment by Sandeep Khuperkar CTO and Dir...Ashnikbiz
This presentation covers What is Polyglot Persistence? And how should you choose the right Database Technology for a scalable architecture and introduction to Emerging world of Polyglot Persistence using open source database ecosystem.
Polyglot Persistence is not something which can be used as an out of the box product, but instead needs to be designed for each individual enterprise for its unique Data Architecture.
Building event-driven Serverless Apps with Azure Functions and Azure Cosmos DBMicrosoft Tech Community
In this session you will learn how to build planet-scale serverless apps using Azure Cosmos DB and Azure Functions. Users expect modem apps to offer event-driven, near real-time experiences. Now you can subscribe to changes in Azure Cosmos DB collections and trigger logic in real time while being globally-distributed, and without deploying or managing any servers.
John Hammink's Talk at Great Wide Open 2016. We discuss: 1.) the need for data analytics infrastructure that can scale exponentially and 2.) what such an infrastructure must contain and finally 3.) the need for an infrastructure to be able to handle un - and semi-structured data.
Complex realtime event analytics using BigQuery @Crunch WarmupMárton Kodok
Complex event analytics solutions require massive architecture, and Know-How to build a fast real-time computing system. Google BigQuery solves this problem by enabling super-fast, SQL-like queries against append-only tables, using the processing power of Google’s infrastructure.In this presentation we will see how Bigquery solves our ultimate goal: Store everything accessible by SQL immediately at petabyte-scale. We will discuss some common use cases: funnels, user retention, affiliate metrics.
An indepth look at Google BigQuery Architecture by Felipe Hoffa of GoogleData Con LA
Abstract:- Come learn about Google BigQuery and its underlying architecture. Felipe will go over the evolution of BigQuery and explain some of the underlying principles of BigQuery and Dremel. Felipe will also go over some of the latest use cases and will demo a use case of Google BigQuery
Bio:-
Felipe Hoffa moved from Chile to San Francisco to join Google as a Software Engineer. Since 2013 he's been a Developer Advocate on big data - to inspire developers around the world to leverage the Google Cloud Platform tools to analyze and understand their data in ways they could never before. You can find him in several YouTube videos, blog posts, and conferences around the world.
Follow Felipe at https://twitter.com/felipehoffa.
"Implementing an Event Sourcing strategy on Azure", Olena Borzenko/Eldert Gro...Fwdays
In recent years the Event Sourcing pattern has become increasingly popular. By storing a history of events it enables us to decouple the storage of data from the implementation of the logic around it. And we can rebuild the state of our data to any point in time, giving us a wide range of opportunities around auditing and compensation.
In this demo-heavy session you will learn how we can use Azure Event Hubs to process and store these events to build our own event store based on Cosmos DB. Moreover, we will also dive into options around connecting to other Azure services and even Kafka applications to easily implement this popular pattern in our own solutions.
Data Con LA 2018 - Big Data as a Service: Running Elasticsearch on Pure by Br...Data Con LA
Big Data as a Service: Running Elasticsearch on Pure by Brian Gold, Founding Member, FlashBlade, PureStorage
As organizations look to scale their use of modern analytics, the traditional deployment model of these tools has become a drag on productivity. Existing big-data architectures typically run on fixed sets of server instances with tightly coupled storage. While originally designed for scalability, these rigid environments cause server sprawl and increase time-to-deployment.
Google Analytics and BigQuery, by Javier Ramirez, from datawakijavier ramirez
Google Analytics is great, but having access to your raw data and being able to query it any way you want is much more powerful. Learn how you can integrate Analytics and BigQuery to unleash all your data potential. Talk delivered at Conversion Thursday London
Connecta Event: Big Query och dataanalys med Google Cloud PlatformConnectaDigital
Avancerad dataanalys och ”big data” har under de senaste åren klättrat på trendlistorna och är nu ett av de mest prioriterade områdena i utvecklingen av nya tjänster och produkter för ledarföretag i det digitala landskapet.
Informationen som byggs upp i systemen när kundmötena digitaliseras har visat sig vara guld värt. Här finns allt vi behöver veta för att göra våra affärer mer effektiva.
Sedan sommaren 2013 har Connecta tillsammans med Google ett etablerat samarbete för att hjälpa våra kunder med övergången till moln-tjänster för bland annat avancerad dataanalys. För att göra oss själva redo att hjälpa våra kunder har vi under ett antal år utvecklat såväl kunskaper som skaffat oss erfarenheter kring Googles olika moln-produkter, som exempelvis ”Big Query”.
Big Query är ett molnbaserat analysverktyg och en del av Google Cloud Platform. Big Query gör det möjligt att ställa snabba frågor mot enorma dataset på bara någon sekund. Big Query och Google Cloud Platform erbjuder färdiga lösningar för att sätta upp och underhålla en infrastruktur som med enkla medel gör allt detta möjligt.
På Connecta Digital Consultings tredje event för våren introducerade vi våra kunder och partners i koncepten dataanalys och Big Query.
Under eventet berördes följande punkter:
- Big Data och Business Intelligence (BI)
- “The Google Big Data tools” – framgångsfaktorer och hur man kommer igång
- Google Cloud Platform och hur man genomför en framgångsrik molnsatsning
Vi presenterade case och berättade om viktiga lärdomar vi dragit i samarbetet med Google och våra kunder.
Google BigQuery is one of the largest, fastest, and most capable cloud data warehouses on the market. In this webinar, we review BigQuery best practices and show you how Matillion ETL can help you get the most out of the platform to gain a competitive edge.
In this webinar:
- Discover how to work quickly and efficiently with Google BigQuery
- Find out the best ways to monitor and control costs
- Hear tips and tricks for loading and transforming massive amounts of data in BigQuery with Matillion ETL
- Get expert advice on improving your performance in BigQuery for quicker data analysis
- Learn how to optimize BigQuery for your marketing analytics needs
Big Query - Utilizing Google Data Warehouse for Media Analyticshafeeznazri
This topic will cover the intermediate understanding of Google Big Query and how Media Prima Digital utilizing Big Query as data warehouse for production.
Google Cloud Data Platform - Why Google for Data Analysis?Andreas Raible
Introduction to our Data Platform from capture, processing, analysis and exploration.
The Google Cloud Platform products are based on our internal systems which are powering Google AdWords, Search, YouTube and our leading research in the field of real-time data analysis.
You can get access ($300 for 60 days) to our free trial through google.com/cloud
Using Google Cloud for Marketing Analytics: How the7stars, the UK’s largest i...Matillion
the7Stars, the leading UK Digital Marketing agency, has global clients ranging from Nintendo to Suzuki to Iceland. With growing data volumes, the7Stars faced the challenge of centralizing all their customers’ marketing data for quick and easy analysis.
In this joint webinar, you will hear about how the7Stars are using Google BigQuery as their data warehouse collating data from many different sources, allowing them to grow their business and attract new customers. the7Stars is also using Matillion ETL to combine the data from different sources and load it all into BigQuery enabling agile and responsive market analysis giving their clients a competitive edge, while saving time and money.
In this webinar learn:
- the7Stars’ data journey for maximizing value
- Google BigQuery, BigQuery Data Transfer Service and best practices for marketing analytics
- How to collect data from different sources and streamline transformations and queries in Google BigQuery with Matillion ETL
- Benefits being actualized by 7 Stars, such as saving time/money and growing their customer base
Watch the full webinar: https://youtu.be/8VEHf_wAXao
Polyglot Persistence and Database Deployment by Sandeep Khuperkar CTO and Dir...Ashnikbiz
This presentation covers What is Polyglot Persistence? And how should you choose the right Database Technology for a scalable architecture and introduction to Emerging world of Polyglot Persistence using open source database ecosystem.
Polyglot Persistence is not something which can be used as an out of the box product, but instead needs to be designed for each individual enterprise for its unique Data Architecture.
Building event-driven Serverless Apps with Azure Functions and Azure Cosmos DBMicrosoft Tech Community
In this session you will learn how to build planet-scale serverless apps using Azure Cosmos DB and Azure Functions. Users expect modem apps to offer event-driven, near real-time experiences. Now you can subscribe to changes in Azure Cosmos DB collections and trigger logic in real time while being globally-distributed, and without deploying or managing any servers.
John Hammink's Talk at Great Wide Open 2016. We discuss: 1.) the need for data analytics infrastructure that can scale exponentially and 2.) what such an infrastructure must contain and finally 3.) the need for an infrastructure to be able to handle un - and semi-structured data.
Complex realtime event analytics using BigQuery @Crunch WarmupMárton Kodok
Complex event analytics solutions require massive architecture, and Know-How to build a fast real-time computing system. Google BigQuery solves this problem by enabling super-fast, SQL-like queries against append-only tables, using the processing power of Google’s infrastructure.In this presentation we will see how Bigquery solves our ultimate goal: Store everything accessible by SQL immediately at petabyte-scale. We will discuss some common use cases: funnels, user retention, affiliate metrics.
An indepth look at Google BigQuery Architecture by Felipe Hoffa of GoogleData Con LA
Abstract:- Come learn about Google BigQuery and its underlying architecture. Felipe will go over the evolution of BigQuery and explain some of the underlying principles of BigQuery and Dremel. Felipe will also go over some of the latest use cases and will demo a use case of Google BigQuery
Bio:-
Felipe Hoffa moved from Chile to San Francisco to join Google as a Software Engineer. Since 2013 he's been a Developer Advocate on big data - to inspire developers around the world to leverage the Google Cloud Platform tools to analyze and understand their data in ways they could never before. You can find him in several YouTube videos, blog posts, and conferences around the world.
Follow Felipe at https://twitter.com/felipehoffa.
"Implementing an Event Sourcing strategy on Azure", Olena Borzenko/Eldert Gro...Fwdays
In recent years the Event Sourcing pattern has become increasingly popular. By storing a history of events it enables us to decouple the storage of data from the implementation of the logic around it. And we can rebuild the state of our data to any point in time, giving us a wide range of opportunities around auditing and compensation.
In this demo-heavy session you will learn how we can use Azure Event Hubs to process and store these events to build our own event store based on Cosmos DB. Moreover, we will also dive into options around connecting to other Azure services and even Kafka applications to easily implement this popular pattern in our own solutions.
Data Con LA 2018 - Big Data as a Service: Running Elasticsearch on Pure by Br...Data Con LA
Big Data as a Service: Running Elasticsearch on Pure by Brian Gold, Founding Member, FlashBlade, PureStorage
As organizations look to scale their use of modern analytics, the traditional deployment model of these tools has become a drag on productivity. Existing big-data architectures typically run on fixed sets of server instances with tightly coupled storage. While originally designed for scalability, these rigid environments cause server sprawl and increase time-to-deployment.
Google Analytics and BigQuery, by Javier Ramirez, from datawakijavier ramirez
Google Analytics is great, but having access to your raw data and being able to query it any way you want is much more powerful. Learn how you can integrate Analytics and BigQuery to unleash all your data potential. Talk delivered at Conversion Thursday London
Google BigQuery for Everyday DeveloperMárton Kodok
IV. IT&C Innovation Conference - October 2016 - Sovata, Romania
A. Every scientist who needs big data analytics to save millions of lives should have that power
Legacy systems don’t provide the power.
B. The simple fact is that you are brilliant but your brilliant ideas require complex analytics.
Traditional solutions are not applicable.
The Plan: have oversight over developments as they happen.
Goal: Store everything accessible by SQL immediately.
What is BigQuery?
Analytics-as-a-Service - Data Warehouse in the Cloud
Fully-Managed by Google (US or EU zone)
Scales into Petabytes
Ridiculously fast
Decent pricing (queries $5/TB, storage: $20/TB) *October 2016 pricing
100.000 rows / sec Streaming API
Open Interfaces (Web UI, BQ command line tool, REST, ODBC)
Familiar DB Structure (table, views, record, nested, JSON)
Convenience of SQL + Javascript UDF (User Defined Functions)
Integrates with Google Sheets + Google Cloud Storage + Pub/Sub connectors
Client libraries available in YFL (your favorite languages)
Our benefits
no provisioning/deploy
no running out of resources
no more focus on large scale execution plan
no need to re-implement tricky concepts
(time windows / join streams)
pay only the columns we have in your queries
run raw ad-hoc queries (either by analysts/sales or Devs)
no more throwing away-, expiring-, aggregating old data.
Vadim Solovey is a CTO of DoiT International has helped to implement Google BigQuery as a cloud data warehouse for many medium and large sized data and analytics initiatives. BigQuery’s serverless architecture had redefined what it means to be fully managed for hundreds of Israeli's startups.
Recently, Google announced an update to BigQuery that dramatically advances cloud data analytics for large-scale businesses such as BigQuery now support Standard SQL, implementing the SQL 2011 standard as well as new ODBC drivers making it possible to use BigQuery with a number of tools ranging from Microsoft Excel to traditional business intelligence systems such as Microstrategy and Qlik.
Agenda:
• Partitioned tables
• The ability to update, delete rows and columns using SQL
• Integration with IAM for fine-grained security policies
• Monitoring w/ StackDriver to track performance and usage
• Query sharing via links, to foster knowledge within orgs
• Cost optimisation strategies
AWS Athena vs. Google BigQuery for interactive SQL QueriesDoiT International
During the re:Invent 2016, AWS has released the Amazon Athena - an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL. Athena is serverless, so there is no infrastructure to manage, and you pay only for the queries that you run.
We took a look on AWS Athena and compared it to the Google BigQuery - another player of serverless interactive data analysis.
Would you like to know which one is the right tool for you? Join us for this meetup to learn AWS Athena and for the test drive of querying exactly the same dataset using AWS Athena and Google BigQuery to see where each one shines (or totally blows it).
A brief study on Storage Area Network (SAN), SAN architecture & its importance. It focuses on the techniques and the technologies that have evolved around SAN & its Security.
BigData Meets the Federal Data Center - an overview of nosql solutions to data challenges (e.g. Hadoop, Hbase, Mongodb, cassandra, redis etc). Also includes a vignette on Google Prediction API.
Building a Marketing Data Warehouse from Scratch - SMX Advanced 202Christopher Gutknecht
This deck covers the journey of starting with BigQuery, adding more data sources and building a process around your data warehouse. It covers the three phases greenfield, dashboards and operational analytics and the necessary data components.
The code for uploading your product feed can be found here:
https://gist.github.com/ChrisGutknecht/fde93092e21039299ab76715596eac01
If you have any questions, reach out to me on Linkedin!
This 2-3 minute presentation is meant to give univeresity hackathoners a brief, high-level overview of Google Cloud and its developer APIs with the purpose of inspiring students to consider these products for their hacks. A longer, more descriptive tech talk comes later.
Entrepreneurship Tips With HTML5 & App Engine Startup Weekend (June 2012)Ido Green
My talk in Startup Weekend 2012 during Google I/O. It cover, startup life tips, modern web apps and how to leverage Google cloud (specific App Engine).
Cloud computing overview & Technical intro to Google Cloudwesley chun
This is a 60-min tech talk designed for developers to give a comprehensive, vendor-agnostic overview of cloud computing. This is followed by an introduction to products in Google Cloud, focusing on the serverless & machine learningproducts. The talk ends with several inspirational examples of what can be built with Google Cloud
These slides are made for the 2013 DevFest talks. It covers the main blocks of Google cloud platform: App engine, Compute Engine, storage options and more.
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoophuguk
At Google Cloud Platform, we're combining the Apache Spark and Hadoop ecosystem with our software and hardware innovations. We want to make these awesome tools easier, faster, and more cost-effective, from 3 to 30,000 cores. This presentation will showcase how Google Cloud Platform is innovating with the goal of bringing the Hadoop ecosystem to everyone.
Bio: "I love data because it surrounds us - everything is data. I also love open source software, because it shows what is possible when people come together to solve common problems with technology. While they are awesome on their own, I am passionate about combining the power of open source software with the potential unlimited uses of data. That's why I joined Google. I am a product manager for Google Cloud Platform and manage Cloud Dataproc and Apache Beam (incubating). I've previously spent time hanging out at Disney and Amazon. Beyond Google, love data, amateur radio, Disneyland, photography, running and Legos."
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Let's dive deeper into the world of ODC! Ricardo Alves (OutSystems) will join us to tell all about the new Data Fabric. After that, Sezen de Bruijn (OutSystems) will get into the details on how to best design a sturdy architecture within ODC.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Search and Society: Reimagining Information Access for Radical FuturesBhaskar Mitra
The field of Information retrieval (IR) is currently undergoing a transformative shift, at least partly due to the emerging applications of generative AI to information access. In this talk, we will deliberate on the sociotechnical implications of generative AI for information access. We will argue that there is both a critical necessity and an exciting opportunity for the IR community to re-center our research agendas on societal needs while dismantling the artificial separation between the work on fairness, accountability, transparency, and ethics in IR and the rest of IR research. Instead of adopting a reactionary strategy of trying to mitigate potential social harms from emerging technologies, the community should aim to proactively set the research agenda for the kinds of systems we should build inspired by diverse explicitly stated sociotechnical imaginaries. The sociotechnical imaginaries that underpin the design and development of information access technologies needs to be explicitly articulated, and we need to develop theories of change in context of these diverse perspectives. Our guiding future imaginaries must be informed by other academic fields, such as democratic theory and critical theory, and should be co-developed with social science scholars, legal scholars, civil rights and social justice activists, and artists, among others.
Introduce self and OutsiteWhat is BigData?Big data[1][2] is the term for a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications. Same as “Web-scale” dataThe challenges include:capture, curation, storage, search, sharing, transfer, analysis and visualization
OLAP (Online Analytical Processing) not a good option because of the volume of dataOLTP (Online Transaction Processing) is not designed for that type of reporting
The Hadoop ecosystem is made up of a lot of companiesHadoop also has it’s origins from Google research which I will talk about shortlyThere are also visualization tools such as Tableau (out of scope of this talk)
Google BigQuery!BigQuery is a RESTfulweb service that enables interactive analysis of massively large datasets working in conjunction with Google Storage. It is an Infrastructure as a Service (IaaS) that may be used complementarily with MapReduce.
Apache Hadoop'sMapReduce and HDFS components originally derived respectively from Google's MapReduce and Google File System (GFS) papers.Hadoop was created by Doug Cutting and Mike Cafarella[5] in 2005. Cutting, who was working at Yahoo! at the time,[6] named it after his son's toy elephant.[7] It was originally developed to support distribution for the Nutch search engine project.[8]The Apache Hadoop framework is composed of the following modules:Hadoop Common – contains libraries and utilities needed by other Hadoop modulesHadoop Distributed File System (HDFS) – a distributed file-system that stores data on commodity machines, providing very high aggregate bandwidth across the cluster.Hadoop YARN – a resource-management platform responsible for managing compute resources in clusters and using them for scheduling of users' applications.HadoopMapReduce – a programming model for large scale data processing.Beyond HDFS, YARN and MapReduce, the entire Apache Hadoop “platform” is now commonly considered to consist of a number of related projects as well – Apache Pig, Apache Hive, Apache HBase, Apache Spark, and others
BigData requires massive amounts of storage on multiple drives and a file system to overcome hardware bottlenecks when processing large data sets.Multiple CPUs are required to map/reduce the data (this includes management of the individual jobs)Running jobs can take time, so the time to map/reduce as well as composing a query matters.
If you don’t, a kitten dies every minute.
No need for installing all of the server softwareEverything is hostedA lot of data science and engineering effort was performed to create BigQueryGoogle uses it internally
Google’s initial technologies where GFS andMapReduce(Google released research papers on both):The Google File System (GFS) in2003by Sanjay Ghemawat, Howard Gobioff, and Shun-Tak LeungMapReduce: Simplified Data Processing on Large Clusters in2004by Jeffrey Dean and Sanjay GhemawatGFS is a proprietary distributed file systemThe main goals of a distributed file system is:1. Speed2. Scalability3. ReliabilityGoogle File System grew out of an earlier Google effort, "BigFiles", developed by Larry Page and Sergey Brin in the early days of Google, while it was still located in Stanford.It is designed to provide efficient, reliable access to data using large clusters of commodity hardware. A new version of the Google File System is codenamed Colossus.
Commodity computing, or commodity cluster computing, is the use of large numbers of already available computing components for parallel computing to get the greatest amount of useful computation at low cost.[1] It is computing done in commodity computers as opposed to high-cost supermicrocomputers or boutique computers. They are easy to populate data centers withSome of the general characteristics of a commodity computer are:Shares a base instruction set common to many different models.Shares an architecture (memory, I/O map and expansion capability) that is common to many different models.High degree of mechanical compatibility, internal components (CPU, RAM, motherboard, peripheral cards, drives) are interchangeable with other models.Software is widely available off-the-shelf.Compatible with most available peripherals, works with most right out of the box.
MapReduce is a programming model for processing large data sets with a parallel, distributed algorithm on a cluster.[1]A MapReduce program is composed of a Map() procedure that performs filtering and sorting (such as sorting students by first name into queues, one queue for each name) and a Reduce() procedure that performs a summary operation (such as counting the number of students in each queue, yielding name frequencies). The "MapReduce System" (also called "infrastructure" or "framework") orchestrates by marshalling the distributed servers, running the various tasks in parallel, managing all communications and data transfers between the various parts of the system, and providing for redundancy and fault tolerance.The model is inspired by the map and reduce functions commonly used in functional programming,[2] although their purpose in the MapReduce framework is not the same as in their original forms.[3]The key contributions of the MapReduce framework are not the actual map and reduce functions, but the scalability and fault-tolerance achieved for a variety of applications by optimizing the execution engine once.MapReduce is useful in a wide range of applications, including distributed pattern-based searching, distributed sorting, web link-graph reversal, term-vector per host, web access log stats, inverted index construction, document clustering, machine learning,[7] andstatistical machine translation.
Released white paper in Sept 2010Dremel is a brand of power tools that primarily rely on their speedas opposed to torque. The goal of google for BigQuery was to query 1TB of data in less than 1s.Dremel has been in production since 2006 and has thousands ofusers within Google. It replaced MapReduce in many instances but can be complementary.Multiple instances of Dremel are deployed inthe company, ranging from tens to thousands of nodes. Examples of using the system include:• Analysis of crawled web documents.• Tracking install data for applications on Android Market.• Crash reporting for Google products.• OCR results from Google Books.• Spam analysis.• Debugging of map tiles on Google Maps.• Tablet migrations in managed Bigtable instances.• Results of tests run on Google’s distributed build system.• Disk I/O statistics for hundreds of thousands of disks.• Resource monitoring for jobs run in Google’s data centers.• Symbols and dependencies in Google’s codebase.Dremel builds on ideas from web search and parallel DBMSs.In contrast to layers such as Pig and Hive for Hadoop, it executes queries natively withouttranslating them into MR jobs.
Dremel allows data to be nested.Only in JSONMakes data more concise for a single tableThe allows a more compact file to import to BigQueryThis makes it easily interoperable with a lot of the current Javascript technologies and NoSQL databases such as MongoDB, etc.Data can also be imported as CSV
** The data is read-only/append **Dremel uses a column-striped storage representation, which enables it to read less data from secondary storage and reduce CPU cost due to cheaper compression. Column stores have been adopted for analyzing relational data [1] but to the best of my knowledge have not been extended to nested data models.One of the ingredients for building interoperable data management components is a shared storage format. Columnar storage proved successful for flat relational data but making it work for Google required adapting it to a nested data model. Figure 1 illustrates the main idea: All values of a nested field such as A.B.C are stored contiguously. Hence, A.B.C can be retrieved without reading A.E, A.B.D, etc. The challenge that it addresses is how to preserve all structural information and be able to reconstruct records from an arbitrary subset of fields.
Web based interface to managementFlat files (csv/json)Libraries in most of the major programming languagesA RESTful APISQL syntax for querying
BigQuery queries are written using a variation of the standard SQL SELECT statement.BigQuery supports a wide variety of functions such as COUNT, arithmetic expressions, and string functionshttps://developers.google.com/bigquery/query-referenceQuery syntaxSELECTWITHINFROMFLATTENJOINWHEREGROUP BYHAVINGORDER BYLIMIT** Retrieving large result sets can be time consuming – USE LIMIT and/or AGGREGATES!
Dremel has most of the standard SQL-ish functions for aggregates, such as COUNT, SUM, MIN, MAX AVGDremel also has functions for extracting JSON in a field using a JSONPath syntaxDremel has an URL and IP functions which can make quick work out of any network/web logs.
BigQuery supports multiple JOIN operations in each SELECT statement.JOIN typesBigQuery supports INNER, LEFT OUTER and CROSS JOIN operations. The default is INNER.CROSS JOIN clauses must not contain an ON clause. CROSS JOIN operations can return a large amount of data and might result in a slow and inefficient query. When possible, use regular JOIN instead.EACH modifierNormal JOIN operations require that the right-side table contains less than 8 MB of compressed data. The EACH modifier is a hint that informs the query execution engine that the JOIN might reference two large tables. The EACH modifier can't be used in CROSS JOIN clauses.When possible, use JOIN without the EACH modifier for best performance. Use JOIN EACH when table sizes are too large for JOIN.
The Building Blocks of BigQuery are:ProjectsTablesDatasetsJobs
Projects are top-level containers in Google's Cloud Platform. They store information about billing and authorized users, and they contain BigQuery data. Each project has a friendly name and a unique ID.BigQuery bills on a per-project basis, so it’s usually easiest to create a single project for your company that’s maintained by your billing department. For more information on how to grant access to your project, see Access Control.
Tables contain your data in BigQuery, along with a corresponding table schema that describes field names, types, and other information. BigQuery also supports views, virtual tables defined by a SQL query.BigQuery creates tables in one of the following ways:Loading data into a new tableRunning a queryCopying a table
Jobs are actions you construct and BigQuery executes on your behalf to load data, export data, query data, or copy data. Since jobs can potentially take a long time to complete, they execute asynchronously and can be polled for their status.BigQuery saves a history of all jobs associated with a project, accessible via the Google Developers Console.
BigQuery can be accessed/or used 3 ways:Browser tool (limited in functionality – can’t update tables)Commandline toolAPIBigQuery supports two data formats for import/export (and streaming):CSVJSON (newline-delimited)Data can be compressed via tar/gzip
The Google BigQuery API is built on HTTP and JSON, so any standard HTTP client can send requests to it and parse the responses.Current libraries:.NET (C#)GoGoogle Web ToolkitJavaJavascriptNode.jsObjective-CPHPPythonRubyUses Oauth2 for authentication
BigQuery has excellent commandline tools written in Python: gcloud, bq and gsutilgcloud allows update and usage of all of the Google Cloud Services from the commandlinebq is a python-based tool that accesses BigQuery from the command line.gsutil is another cloud based tool which can upload/download files to Google Cloud StorageThese tools allow you the option to script via powershell or other means if you do not want to use the API.
Rented massive parallelism is much more cost effective than trying to set up the infrastructure to do it yourself. BigQuery is comparable to Amazon Elastic MapReduce (EMR) and Cloudera’sHadoop pricingWith Amazon EMRyou can launch a 10-node Hadoop cluster for as little as $0.15 per hour. BiqQuery does not price with a node structure, however.
Computing Bigdata requires large clusters of commodity hardware to do correctly.Maintaining a datacenter while trying to implement something like Hadoop can be very challenging for even the most veteran neck-beards.Cloud computing provides all of the redundancy, scalability and other ‘ilities’BigQuery has two pricing plans:On-DemandReserved-Capacity
Pay as you go modelResource Pricing:Loading data – FreeExporting data - FreeTable reads - FreeStorage$0.026 (per GB/month) Streaming Inserts Free until July 1, 2014 (After July 1, 2014, $0.01 (per 100,000 rows) for streaming inserts)How am I charged for queries?BigQuery uses a columnar data structure, which means that for a given query, you are only charged for data processed in each column, not the entire table. For instance, if a table has 26 columns, and you run the following query: SELECT a, b, f FROM table1 WHERE d > 100 ORDER BY eYou would be charged for processing data in columns a, b, f, d, and e only. For more information on column-oriented database structures, see Column-oriented DBMS.BigQuery accesses all rows of a table when you run a query on the table, and charges according to the total data processed in the columns you select. ** For this reason, if you expect your queries to be generally focused on data from a particular time frame, it can be economical and sometimes better performing to shard your data into separate tables based on a timestamp.If you receive a query error, you aren't charged for that query.Resource Pricing: Interactive Queries $0.005 (per GB processed) &Batch Queries$0.005 (per GB processed)1Charges rounded up to the nearest MB; minimum 10 MB data processed per each table referenced by a query2The first 100 GB of data processed per month is at no charge3Charges are based on the uncompressed data size.
For customers with consistent or larger workloadsreserved capacity can save as much as 70% off On-Demand Pricing.To sign up for reserved capacity, contact a sales representative.