Presenting 3 real-life use cases of Apache Beam in production. Code reusability for bounded and unbounded data as well as running Apache Beam to write into different cloud providers are some of the aspects that will be treated in this presentation.
Gimel is a data abstraction framework built on Apache Spark - providing unified Data Access via API & SQL to different technologies such as kafka, elastic, HBASE, Rest API, File, Object stores, Relational , etc.
We spoke about this recently in the "cloud track" in the "Scale By The Bay" Conference.
https://www.scale.bythebay.io/schedule
https://sched.co/e55D
Youtube - https://www.youtube.com/watch?v=cy8g2WZbEBI&ab_channel=FunctionalTV
https://youtu.be/m6_0iI4XDpU
View IT operations as a flow of data (Sources of Truth) thru work-cells (automation processes) to deliver value to the customer.
There should be only one source of truth for every piece of configuration data.
Device configurations are poor source of truth.
LavaCon 2017 - Static Site Generators are the Game ChangersJack Molisani
The best-known static site generator, Jekyll, was created eight years ago. That was a long time ago in tech years, yet many documentation specialists still don’t know about it. This established technology is a game changer for building documentation portals. Thanks to Jekyll, only the sky is the limit.
Relevance trilogy may dream be with you! (dec17)Woonsan Ko
Introducing new BloomReach Experience Plugins which changes the game of DREAM (Digital Relevance Experience & Agility Management), to increase productivity and business agility.
Forge - DevCon 2016: Introduction to Forge 3D Print API Through Sample Applic...Autodesk
This document provides an overview and agenda for a presentation on Autodesk Forge 3D Print API. The presentation will cover an overview of the 3D Print API, how Dremel leverages Forge, print preparation steps, the printer simulator, sending a print job to a printer, and connecting to service bureaus. The document includes code samples for importing and repairing meshes, generating supports, creating trays, slicing models, and submitting print jobs using the Forge 3D Print API in Python. Resources mentioned include documentation, a print simulator sample on GitHub, and the Forge developer website.
In this sessions, we introduce how to create an automatic devices provision solution with Azure IoT and .NET Core, and how to deploy the IoT solutions at scale.
Supercharge your data analytics with BigQueryMárton Kodok
Powering interactive data analysis require massive architecture, and Know-How to build a fast real-time computing system. BigQuery solves this problem by enabling super-fast, SQL-like queries against petabytes of data using the processing power of Google’s infrastructure. We will cover its core features, creating tables, columns, views, working with partitions, clustering for cost optimizations, streaming inserts, User Defined Functions, and several use cases for everydaay developer: funnel analytics, behavioral analytics, exploring unstructured data.
The other part will be about BigQuery ML, which enables users to create and execute machine learning models in BigQuery using standard SQL queries. BigQuery ML democratizes machine learning by enabling SQL practitioners to build models using existing SQL tools and skills. BigQuery ML increases development speed by eliminating the need to move data.
Azure for SharePoint Developers - Workshop - Part 2: Azure FunctionsBob German
This document discusses Azure Functions and common use cases for using Azure Functions with SharePoint Online. It provides an overview of Azure Functions including common triggers, bindings, and runtimes. It then describes examples of using Functions with SharePoint Online for tasks like elevating permissions, running timer jobs, hiding API keys, extending site designs and scripts, and custom code in Microsoft Flow. Additional details and code samples are provided for Functions integration with SharePoint Online.
Gimel is a data abstraction framework built on Apache Spark - providing unified Data Access via API & SQL to different technologies such as kafka, elastic, HBASE, Rest API, File, Object stores, Relational , etc.
We spoke about this recently in the "cloud track" in the "Scale By The Bay" Conference.
https://www.scale.bythebay.io/schedule
https://sched.co/e55D
Youtube - https://www.youtube.com/watch?v=cy8g2WZbEBI&ab_channel=FunctionalTV
https://youtu.be/m6_0iI4XDpU
View IT operations as a flow of data (Sources of Truth) thru work-cells (automation processes) to deliver value to the customer.
There should be only one source of truth for every piece of configuration data.
Device configurations are poor source of truth.
LavaCon 2017 - Static Site Generators are the Game ChangersJack Molisani
The best-known static site generator, Jekyll, was created eight years ago. That was a long time ago in tech years, yet many documentation specialists still don’t know about it. This established technology is a game changer for building documentation portals. Thanks to Jekyll, only the sky is the limit.
Relevance trilogy may dream be with you! (dec17)Woonsan Ko
Introducing new BloomReach Experience Plugins which changes the game of DREAM (Digital Relevance Experience & Agility Management), to increase productivity and business agility.
Forge - DevCon 2016: Introduction to Forge 3D Print API Through Sample Applic...Autodesk
This document provides an overview and agenda for a presentation on Autodesk Forge 3D Print API. The presentation will cover an overview of the 3D Print API, how Dremel leverages Forge, print preparation steps, the printer simulator, sending a print job to a printer, and connecting to service bureaus. The document includes code samples for importing and repairing meshes, generating supports, creating trays, slicing models, and submitting print jobs using the Forge 3D Print API in Python. Resources mentioned include documentation, a print simulator sample on GitHub, and the Forge developer website.
In this sessions, we introduce how to create an automatic devices provision solution with Azure IoT and .NET Core, and how to deploy the IoT solutions at scale.
Supercharge your data analytics with BigQueryMárton Kodok
Powering interactive data analysis require massive architecture, and Know-How to build a fast real-time computing system. BigQuery solves this problem by enabling super-fast, SQL-like queries against petabytes of data using the processing power of Google’s infrastructure. We will cover its core features, creating tables, columns, views, working with partitions, clustering for cost optimizations, streaming inserts, User Defined Functions, and several use cases for everydaay developer: funnel analytics, behavioral analytics, exploring unstructured data.
The other part will be about BigQuery ML, which enables users to create and execute machine learning models in BigQuery using standard SQL queries. BigQuery ML democratizes machine learning by enabling SQL practitioners to build models using existing SQL tools and skills. BigQuery ML increases development speed by eliminating the need to move data.
Azure for SharePoint Developers - Workshop - Part 2: Azure FunctionsBob German
This document discusses Azure Functions and common use cases for using Azure Functions with SharePoint Online. It provides an overview of Azure Functions including common triggers, bindings, and runtimes. It then describes examples of using Functions with SharePoint Online for tasks like elevating permissions, running timer jobs, hiding API keys, extending site designs and scripts, and custom code in Microsoft Flow. Additional details and code samples are provided for Functions integration with SharePoint Online.
ApacheCon @Home 2020
StreamPipes is an open source self-service IoT toolbox to enable non-technical users to connect, analyze and explore IoT data streams.
https://streampipes.apache.org/
InterConnect2016: WebApp Architectures with Java and Node.jsChris Bailey
Java has been the historical leader for enterprise web application development. However, Node.js is rapidly gaining in popularity for developing mobile apps, APIs and web applications. Java and Node.js are complimentary tools for enterprise web application development and this session will highlight the strengths and complimentary nature of each.
Presented at IBM InterConnect 2016
Hamburg Data Science Meetup - MLOps with a Feature StoreMoritz Meister
MLOps is a trend in machine learning (ML) engineering that unifies ML system development (Dev) and ML system operation (Ops). Some ML lifecycle frameworks, such as TensorFlow Extended, are based around end-to-end pipelines that start with raw data and end in production models. During this talk we will introduce the concept of a feature store as the missing piece of ML infrastructure that enables faster lower cost deployment of models. We will show how the Hopsworks Feature Store - factors monolithic end-to-end ML pipelines into feature and model training pipelines that can each run at different cadences. We will show examples of ingestion and training pipelines including hyperparameter optimization and model deployment.
The document discusses Gimel, an open source unified data access framework for Apache Spark. It provides an overview of PayPal's data landscape and challenges with data access. Gimel addresses these challenges by providing a single SQL interface and unified configuration to access multiple data stores. The document then provides a deep dive into Gimel's implementation including its catalog provider, dataset factories, and data API. It concludes with discussing next steps and taking questions.
Data Orchestration Summit 2020 organized by Alluxio
https://www.alluxio.io/data-orchestration-summit-2020/
Unified Data Access with Gimel
Deepak Chandramouli, Engineering Lead
Anisha Nainani, Sr. Software Engineer
Dr. Vladimir Bacvanski, Principal Architect (Paypal)
About Alluxio: alluxio.io
Engage with the open source community on slack: alluxio.io/slack
Connecting the Dots: Kong for GraphQL EndpointsJulien Bataillé
GraphQL is a popular alternative to REST for front-end applications as it offers flexibility and developer-friendly tooling. In this talk, we will look into the differences between REST and GraphQL, how GraphQL API Management presents a new set of challenges, and finally, how we can address those challenges by leveraging Kong extensibility.
CodeCamp Iasi - Creating serverless data analytics system on GCP using BigQueryMárton Kodok
Teaser: provide developers a new way of understanding advanced analytics and choosing the right cloud architecture
The new buzzword is #serverless, as there are many great services that helps us abstract away the complexity associated with managing servers. In this session we will see how serverless helps on large data analytics backends.
We will see how to architect for Cloud and implement into an existing project components that will take us into the #serverless architecture that will ingest our streaming data, run advanced analytics on petabytes of data using BigQuery on Google Cloud Platform - all this next to an existing stack, without being forced to reengineer our app.
BigQuery enables super-fast, SQL/Javascript queries against petabytes of data using the processing power of Google’s infrastructure. We will cover its core features, SQL 2011 standard, working with streaming inserts, User Defined Functions written in Javascript, reference external JS libraries, and several use cases for everyday backend developer: funnel analytics, email heatmap, custom data processing, building dashboards, extracting data using JS functions, emitting rows based on business logic.
This half-day tutorial introduces Protocol Buffers, gRPC, and the open source tools that Google uses to publish and support some of the world's biggest APIs. We'll show how the Protocol Buffer language allows APIs to be described, reviewed, and implemented in a programming-language independent way, how gRPC enables high-performance streaming APIs, and how \ a few simple conventions can enable related tools to serve robust REST APIs and generate production-quality client libraries in seven popular programming languages. This is API publishing the Google way, but large teams aren't required. With shared open-source tooling, even the smallest developer can build scalable, usable APIs that delight.
https://apistrat18.sched.com/event/FTR3/usable-apis-at-scale-with-protocol-buffers-and-grpc-tim-burks-andrew-gunsch-google
Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...Chester Chen
GoPro’s camera, drone, mobile devices as well as web, desktop applications are generating billions of event logs. The analytics metrics and insights that inform product, engineering, and marketing team decisions need to be distributed quickly and efficiently. We need to visualize the metrics to find the trends or anomalies.
While trying to building up the features store for machine learning, we need to visualize the features, Google Facets is an excellent project for visualizing features. But can we visualize larger feature dataset?
These are issues we encounter at GoPro as part of the data platform evolution. In this talk, we will discuss few of the progress we made at GoPro. We will talk about how to use Slack + Plot.ly to delivery analytics metrics and visualization. And we will also discuss our work to visualize large feature set using Google Facets with Apache Spark.
This document provides an introduction and overview of StatsD, including:
- A brief history of StatsD and how it was originally created by Flickr and implemented by Etsy.
- An overview of the StatsD architecture which involves sending metrics from applications over UDP to the StatsD server, which then sends the data to Carbon over TCP.
- An explanation of the different metric types StatsD supports - counters, gauges, sets, and timings - and examples of common use cases.
- Instructions for installing and running a StatsD server as well as examples of using StatsD clients in Node.js and Java applications.
apidays LIVE Australia 2020 - From micro to macro-coordination through domain...apidays
apidays LIVE Australia 2020 - Building Business Ecosystems
From micro to macro-coordination through domain-centric DDL pipeline
Alex Khilko, CTO of PlayQ Inc.
Altitude NY 2018: Leveraging Log Streaming to Build the Best Dashboards, EverFastly
If knowing is half the battle, having the most information available is the best way to win. Using real-time log streaming and a knowledge of the data passing through the system, metrics can provide more depth and breadth in to the goings on requests as they pass through various parts of the stack. This session will cover the difference between logging and metrics, writing JSON and Influx Line Protocol in VCL, and building out dashboards to give deeper insights (and more importantly, alerting) on requests and responses at the edge.
How to Leverage APIs for SEO #TTTLive2019Paul Shapiro
Learn the basic of APIs and how they can be leveraged for SEO and marketing. Chalk full of Python code examples.
The URL to the GitHub gist link on slide 54 has changed to the following:
https://gist.github.com/pshapiro/a86dc340f57c38fc22d0545ddec1fc9e
The document discusses using semantic web technologies like structured data, JSON-LD, and linked data to enrich content in TYPO3 with metadata. It provides examples of generating schema.org structured data for pages, news articles, and organizations. It also proposes using a REST API powered by the Hydra specification to expose this semantic data and content to applications and search engines.
Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...Jason Dai
This document summarizes a CVPR 2020 tutorial on the Analytics Zoo platform for automated machine learning workflows for distributed big data using Apache Spark. The tutorial covers an overview of Analytics Zoo and the BigDL distributed deep learning framework. It demonstrates distributed training of deep learning models using TensorFlow and PyTorch on Spark, and features of Analytics Zoo like end-to-end pipelines, ML workflow for automation, and model deployment with cluster serving. Real-world use cases applying Analytics Zoo at companies like SK Telecom, Midea, and MasterCard are also presented.
by Rohan Dubal, Software Development Engineer, AWS
One of the biggest time sinks and challenges for mobile application developers is developing, accessing, and managing all of the disparate data sources that are involved in delivering delightful, collaborative, and real-time mobile experiences for users while also enabling offline capabilities for when a user is not connected, but still wants to use the app. In this session, you be introduced to the new AWS AppSync service that speed and simplifies these tasks for developers using GraphQL to provide a data abstraction layer and easy query and update statements without having to know the details of the underlying data sources.
The document discusses using Parse Cloud Code to build web applications, including basic operations like create, read, update, delete, how Parse and RESTful APIs work, and how to use Cloud Code to call external APIs, run background jobs, and include other JavaScript modules.
ApacheCon @Home 2020
StreamPipes is an open source self-service IoT toolbox to enable non-technical users to connect, analyze and explore IoT data streams.
https://streampipes.apache.org/
InterConnect2016: WebApp Architectures with Java and Node.jsChris Bailey
Java has been the historical leader for enterprise web application development. However, Node.js is rapidly gaining in popularity for developing mobile apps, APIs and web applications. Java and Node.js are complimentary tools for enterprise web application development and this session will highlight the strengths and complimentary nature of each.
Presented at IBM InterConnect 2016
Hamburg Data Science Meetup - MLOps with a Feature StoreMoritz Meister
MLOps is a trend in machine learning (ML) engineering that unifies ML system development (Dev) and ML system operation (Ops). Some ML lifecycle frameworks, such as TensorFlow Extended, are based around end-to-end pipelines that start with raw data and end in production models. During this talk we will introduce the concept of a feature store as the missing piece of ML infrastructure that enables faster lower cost deployment of models. We will show how the Hopsworks Feature Store - factors monolithic end-to-end ML pipelines into feature and model training pipelines that can each run at different cadences. We will show examples of ingestion and training pipelines including hyperparameter optimization and model deployment.
The document discusses Gimel, an open source unified data access framework for Apache Spark. It provides an overview of PayPal's data landscape and challenges with data access. Gimel addresses these challenges by providing a single SQL interface and unified configuration to access multiple data stores. The document then provides a deep dive into Gimel's implementation including its catalog provider, dataset factories, and data API. It concludes with discussing next steps and taking questions.
Data Orchestration Summit 2020 organized by Alluxio
https://www.alluxio.io/data-orchestration-summit-2020/
Unified Data Access with Gimel
Deepak Chandramouli, Engineering Lead
Anisha Nainani, Sr. Software Engineer
Dr. Vladimir Bacvanski, Principal Architect (Paypal)
About Alluxio: alluxio.io
Engage with the open source community on slack: alluxio.io/slack
Connecting the Dots: Kong for GraphQL EndpointsJulien Bataillé
GraphQL is a popular alternative to REST for front-end applications as it offers flexibility and developer-friendly tooling. In this talk, we will look into the differences between REST and GraphQL, how GraphQL API Management presents a new set of challenges, and finally, how we can address those challenges by leveraging Kong extensibility.
CodeCamp Iasi - Creating serverless data analytics system on GCP using BigQueryMárton Kodok
Teaser: provide developers a new way of understanding advanced analytics and choosing the right cloud architecture
The new buzzword is #serverless, as there are many great services that helps us abstract away the complexity associated with managing servers. In this session we will see how serverless helps on large data analytics backends.
We will see how to architect for Cloud and implement into an existing project components that will take us into the #serverless architecture that will ingest our streaming data, run advanced analytics on petabytes of data using BigQuery on Google Cloud Platform - all this next to an existing stack, without being forced to reengineer our app.
BigQuery enables super-fast, SQL/Javascript queries against petabytes of data using the processing power of Google’s infrastructure. We will cover its core features, SQL 2011 standard, working with streaming inserts, User Defined Functions written in Javascript, reference external JS libraries, and several use cases for everyday backend developer: funnel analytics, email heatmap, custom data processing, building dashboards, extracting data using JS functions, emitting rows based on business logic.
This half-day tutorial introduces Protocol Buffers, gRPC, and the open source tools that Google uses to publish and support some of the world's biggest APIs. We'll show how the Protocol Buffer language allows APIs to be described, reviewed, and implemented in a programming-language independent way, how gRPC enables high-performance streaming APIs, and how \ a few simple conventions can enable related tools to serve robust REST APIs and generate production-quality client libraries in seven popular programming languages. This is API publishing the Google way, but large teams aren't required. With shared open-source tooling, even the smallest developer can build scalable, usable APIs that delight.
https://apistrat18.sched.com/event/FTR3/usable-apis-at-scale-with-protocol-buffers-and-grpc-tim-burks-andrew-gunsch-google
Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...Chester Chen
GoPro’s camera, drone, mobile devices as well as web, desktop applications are generating billions of event logs. The analytics metrics and insights that inform product, engineering, and marketing team decisions need to be distributed quickly and efficiently. We need to visualize the metrics to find the trends or anomalies.
While trying to building up the features store for machine learning, we need to visualize the features, Google Facets is an excellent project for visualizing features. But can we visualize larger feature dataset?
These are issues we encounter at GoPro as part of the data platform evolution. In this talk, we will discuss few of the progress we made at GoPro. We will talk about how to use Slack + Plot.ly to delivery analytics metrics and visualization. And we will also discuss our work to visualize large feature set using Google Facets with Apache Spark.
This document provides an introduction and overview of StatsD, including:
- A brief history of StatsD and how it was originally created by Flickr and implemented by Etsy.
- An overview of the StatsD architecture which involves sending metrics from applications over UDP to the StatsD server, which then sends the data to Carbon over TCP.
- An explanation of the different metric types StatsD supports - counters, gauges, sets, and timings - and examples of common use cases.
- Instructions for installing and running a StatsD server as well as examples of using StatsD clients in Node.js and Java applications.
apidays LIVE Australia 2020 - From micro to macro-coordination through domain...apidays
apidays LIVE Australia 2020 - Building Business Ecosystems
From micro to macro-coordination through domain-centric DDL pipeline
Alex Khilko, CTO of PlayQ Inc.
Altitude NY 2018: Leveraging Log Streaming to Build the Best Dashboards, EverFastly
If knowing is half the battle, having the most information available is the best way to win. Using real-time log streaming and a knowledge of the data passing through the system, metrics can provide more depth and breadth in to the goings on requests as they pass through various parts of the stack. This session will cover the difference between logging and metrics, writing JSON and Influx Line Protocol in VCL, and building out dashboards to give deeper insights (and more importantly, alerting) on requests and responses at the edge.
How to Leverage APIs for SEO #TTTLive2019Paul Shapiro
Learn the basic of APIs and how they can be leveraged for SEO and marketing. Chalk full of Python code examples.
The URL to the GitHub gist link on slide 54 has changed to the following:
https://gist.github.com/pshapiro/a86dc340f57c38fc22d0545ddec1fc9e
The document discusses using semantic web technologies like structured data, JSON-LD, and linked data to enrich content in TYPO3 with metadata. It provides examples of generating schema.org structured data for pages, news articles, and organizations. It also proposes using a REST API powered by the Hydra specification to expose this semantic data and content to applications and search engines.
Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...Jason Dai
This document summarizes a CVPR 2020 tutorial on the Analytics Zoo platform for automated machine learning workflows for distributed big data using Apache Spark. The tutorial covers an overview of Analytics Zoo and the BigDL distributed deep learning framework. It demonstrates distributed training of deep learning models using TensorFlow and PyTorch on Spark, and features of Analytics Zoo like end-to-end pipelines, ML workflow for automation, and model deployment with cluster serving. Real-world use cases applying Analytics Zoo at companies like SK Telecom, Midea, and MasterCard are also presented.
by Rohan Dubal, Software Development Engineer, AWS
One of the biggest time sinks and challenges for mobile application developers is developing, accessing, and managing all of the disparate data sources that are involved in delivering delightful, collaborative, and real-time mobile experiences for users while also enabling offline capabilities for when a user is not connected, but still wants to use the app. In this session, you be introduced to the new AWS AppSync service that speed and simplifies these tasks for developers using GraphQL to provide a data abstraction layer and easy query and update statements without having to know the details of the underlying data sources.
The document discusses using Parse Cloud Code to build web applications, including basic operations like create, read, update, delete, how Parse and RESTful APIs work, and how to use Cloud Code to call external APIs, run background jobs, and include other JavaScript modules.
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECTjpsjournal1
The rivalry between prominent international actors for dominance over Central Asia's hydrocarbon
reserves and the ancient silk trade route, along with China's diplomatic endeavours in the area, has been
referred to as the "New Great Game." This research centres on the power struggle, considering
geopolitical, geostrategic, and geoeconomic variables. Topics including trade, political hegemony, oil
politics, and conventional and nontraditional security are all explored and explained by the researcher.
Using Mackinder's Heartland, Spykman Rimland, and Hegemonic Stability theories, examines China's role
in Central Asia. This study adheres to the empirical epistemological method and has taken care of
objectivity. This study analyze primary and secondary research documents critically to elaborate role of
china’s geo economic outreach in central Asian countries and its future prospect. China is thriving in trade,
pipeline politics, and winning states, according to this study, thanks to important instruments like the
Shanghai Cooperation Organisation and the Belt and Road Economic Initiative. According to this study,
China is seeing significant success in commerce, pipeline politics, and gaining influence on other
governments. This success may be attributed to the effective utilisation of key tools such as the Shanghai
Cooperation Organisation and the Belt and Road Economic Initiative.
Harnessing WebAssembly for Real-time Stateless Streaming PipelinesChristina Lin
Traditionally, dealing with real-time data pipelines has involved significant overhead, even for straightforward tasks like data transformation or masking. However, in this talk, we’ll venture into the dynamic realm of WebAssembly (WASM) and discover how it can revolutionize the creation of stateless streaming pipelines within a Kafka (Redpanda) broker. These pipelines are adept at managing low-latency, high-data-volume scenarios.
Advanced control scheme of doubly fed induction generator for wind turbine us...IJECEIAES
This paper describes a speed control device for generating electrical energy on an electricity network based on the doubly fed induction generator (DFIG) used for wind power conversion systems. At first, a double-fed induction generator model was constructed. A control law is formulated to govern the flow of energy between the stator of a DFIG and the energy network using three types of controllers: proportional integral (PI), sliding mode controller (SMC) and second order sliding mode controller (SOSMC). Their different results in terms of power reference tracking, reaction to unexpected speed fluctuations, sensitivity to perturbations, and resilience against machine parameter alterations are compared. MATLAB/Simulink was used to conduct the simulations for the preceding study. Multiple simulations have shown very satisfying results, and the investigations demonstrate the efficacy and power-enhancing capabilities of the suggested control system.
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...University of Maribor
Slides from talk presenting:
Aleš Zamuda: Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapter and Networking.
Presentation at IcETRAN 2024 session:
"Inter-Society Networking Panel GRSS/MTT-S/CIS
Panel Session: Promoting Connection and Cooperation"
IEEE Slovenia GRSS
IEEE Serbia and Montenegro MTT-S
IEEE Slovenia CIS
11TH INTERNATIONAL CONFERENCE ON ELECTRICAL, ELECTRONIC AND COMPUTING ENGINEERING
3-6 June 2024, Niš, Serbia
Literature Review Basics and Understanding Reference Management.pptxDr Ramhari Poudyal
Three-day training on academic research focuses on analytical tools at United Technical College, supported by the University Grant Commission, Nepal. 24-26 May 2024
ACEP Magazine edition 4th launched on 05.06.2024Rahul
This document provides information about the third edition of the magazine "Sthapatya" published by the Association of Civil Engineers (Practicing) Aurangabad. It includes messages from current and past presidents of ACEP, memories and photos from past ACEP events, information on life time achievement awards given by ACEP, and a technical article on concrete maintenance, repairs and strengthening. The document highlights activities of ACEP and provides a technical educational article for members.
2. ● USA based monetization platform for mobile game
developers.
● Growing our engineering office in Barcelona with a strong
tech-driven culture:
- Visibility of the impact of your code.
- Not afraid of implementing tech, like Apache beam ; )
- Fostering best practices and a true care for code quality.
+300k
Mobile game
integrations
900M
Unique
Users
200B
Ad request
2011
Founding
year
Who we are
100TBs
Data Scale
4. Agenda
● What is Apache Beam
● Basic Requirements
● Production use cases:
○ Ingest data into BigQuery or GCS
○ BigQuery to BigTable
○ BigQuery to S3 (Parquet)
● Dealing with Streaming
● Questions
5. What is Apache Beam
“Apache Beam is an open source,
unified model for defining both batch
and streaming data-parallel processing
pipelines.”
Write once, run anywhere
6. Basic Requirements
- Reduce cluster provisioning overhead.
- Create a generic and reusable code.
- The architecture must be agnostic of the source (streaming or
batch).
- The architecture must have different configurable sinks.
11. import com.google.api.services.bigquery.model.TableRow;
public class TestTransformation {
public TableRow customTransformation(TableRow tr) {
tr.set("src_name",
tr.get("src_name").toString().toUpperCase());
return tr;
}
}
class Types {
static final String TEST = "test";
}
15. How custom transformations work
@ProcessElement
public void processElement(ProcessContext context) {
try {
String element = context.element();
if (element != null) {
JsonModel jsonModel = Utils.parseRawJson(element, JsonModel.class);
if (jsonModel != null) {
TableRow tr = customTransformation(JsonParseToBigQuery.getInstance()
.getJsonParse(jsonModel.getDetails(), howToParse));
if (tr != null) {
context.output(tr);
}
}
}
} catch (Exception e) {
LOG.error("Error on parsing", e);
}
}
}
16. Parameter Default Value
inputPath -
isParquet true
outputTableSpec -
howToParse -
disposition APPEND
isPartitionedTable true
writeIntoBq true
writeIntoGCS false
outputDirectory -
numShards 20
writeAsParquet true
Main Batch Jobs Parameters
17. Parameter Input
inputPath gs://bucket_x/*.parquet
batchJobInstance test
outputDirectory gs://bucket_y/test
outputTableSpec project:dataset.table
writeIntoGCS true
parquetSchema original_app_id:STRING,app_id:LON
G
howToParse {"original_app_id":"original_app_id","
app.id":"app_id","src_name":"src_nam
e","id":"id"}
18. Parameter Default Value
streamSource pub_sub
inputTopic -
subscription -
kafkaBrokers -
kafkaGroupId -
filePartitionPolicy DAILY
writeIntoBq true
writeIntoGCS false
outputDirectory -
numShards 20
windowDuration 5m
Main Streaming Jobs Parameters
19.
20. Problems / Tips
- Debugging failures was not always easy.
- If you want to create templates, remember, ValueProviders are
only available at Runtime.
- Be careful with non thread safe classes.
- Default GCP instances are okay, but try to use custom ones.
23. What is BigTable
“Bigtable is a compressed, high
performance, proprietary data storage
system built on Google File System, Chubby
Lock Service, SSTable (log-structured
storage like LevelDB) and a few other
Google technologies”
Key, Value storage
24. ColumnFamily (info)
RowKey Qualifier (name) Qualifier (email) Qualifier(phone)
ofsehn28u492 Bill Green bgreen@gmail.com 555-958-382
kfgiuiu5937je3 jdoe@gmail.com 555-738-234
iojcou9wujd77 Rick Sanchez
BigTable table example
25. Parameter Input
sqlQuery SELECT X, Y, Z FROM ...
rowKeyMap -
bqToBtMap cf:qualifier:something,cf2...
bigTableInstanceId chartboost
externalSinkProject project-id-x
bigTableAppProfileId batch
26. Problems / Tips
● For heavy load jobs always use
BigTable application profiles.
30. public interface Options extends BigQueryToParquetOptions {
}
----------------------------------------------------------------------------------------------------
BasicAWSCredentials awsCred = new BasicAWSCredentials(options.getAWSAccessKey(),
options.getAWSSecretKey());
options.as(AwsOptions.class).setAwsCredentialsProvider(new AWSStaticCredentialsProvider(awsCred));
options.as(AwsOptions.class).setAwsRegion(options.getAWSRegion());
What we need to connect to AWS
31. private static PCollection<TableRow> executeSql(Pipeline p, String sql) {
return
p.apply(BigQueryIO.readTableRows().fromQuery(sql).withMethod(BigQueryIO.TypedRead.Method.DIRECT_READ)
.usingStandardSql());
}
How to read from BigQuery
32. Problems / Tips
- Choose the right region in order to reduce latency and cost.
- To avoid extraction quota issues use DIRECT_READ
- FileIO only writes.
- Be careful with complex types (arrays, nested arrays).