On 2020-12-09 Laurens Vijnck and Jonny Daenen gave a workshop at PXL.
During this session, we collectively provisioned a streaming ingestion pipeline in mere minutes. The technology stack included Pub/Sub, Dataflow, and BigQuery. Hereafter, students had the opportunity to perform interactive queries on their own real-time data to answer a series of business questions. These questions were borrowed from real-life cases that we encountered at Selligent Marketing Cloud.
Google Colab (Free Jupyter Notebooks) and Google Data Studio have proven to be excellent tools to facilitate these kinds of interactive sessions.
A three hour lecture I gave at the Jyväskylä Summer School. The talk goes through important details about the use of data science in real businesses. These include data deployment, data processing, practical issues with data solutions and arising trends in data science.
See also Part 1 of the lecture: Introduction Data Science. You can find it in my profile (click the face)
Data-Drive DevOps: Mining Machine Data for "Metrics that Matter"Splunk
Splunk's Andi Mann addresses what he refers to as the real core of DevOps: increasing collaboration, communication, integration and delivery of better, faster software; the human side of DevOps, combined with the business impacts.
Learn How to Design, Build and Map Services to Quantifiable Measurements in S...Splunk
IT departments are most effective when IT services are measured against business objectives and defined performance indicators. But tracking performance of these services has historically been a challenge.
This webinar explains how you can design, build and map performance of your IT services—improving support of critical business functions, processes and applications.
Topics include:
-Best practices to design and build an effective service model
-Techniques to deconstruct a service into its component parts
-How to build meaningful “glass tables” in Splunk ITSI for real-time insights into service health and key performance indicators
How to Design, Build and Map IT and Business Services in SplunkSplunk
Your IT department supports critical business functions, processes and products. You're most effective when your technology initiatives are closely aligned and measured with specific business objectives. This session covers best practices and techniques for designing and building an effective service model, using the domain knowledge of your experts and capturing and reporting on key metrics that everyone can understand.
A three hour lecture I gave at the Jyväskylä Summer School. The talk goes through important details about the use of data science in real businesses. These include data deployment, data processing, practical issues with data solutions and arising trends in data science.
See also Part 1 of the lecture: Introduction Data Science. You can find it in my profile (click the face)
Data-Drive DevOps: Mining Machine Data for "Metrics that Matter"Splunk
Splunk's Andi Mann addresses what he refers to as the real core of DevOps: increasing collaboration, communication, integration and delivery of better, faster software; the human side of DevOps, combined with the business impacts.
Learn How to Design, Build and Map Services to Quantifiable Measurements in S...Splunk
IT departments are most effective when IT services are measured against business objectives and defined performance indicators. But tracking performance of these services has historically been a challenge.
This webinar explains how you can design, build and map performance of your IT services—improving support of critical business functions, processes and applications.
Topics include:
-Best practices to design and build an effective service model
-Techniques to deconstruct a service into its component parts
-How to build meaningful “glass tables” in Splunk ITSI for real-time insights into service health and key performance indicators
How to Design, Build and Map IT and Business Services in SplunkSplunk
Your IT department supports critical business functions, processes and products. You're most effective when your technology initiatives are closely aligned and measured with specific business objectives. This session covers best practices and techniques for designing and building an effective service model, using the domain knowledge of your experts and capturing and reporting on key metrics that everyone can understand.
Over 90% of today’s data has been generated in the last two years, and growth rates continue to climb. In this session, we’ll step through challenges and best practices with data capturing, how to derive meaningful insights to help predict the future, and common pitfalls in data analysis.
Come discover how integrated solutions involving Amazon S3, AWS Glue, Amazon Redshift, Amazon Athena, Amazon EMR, Amazon Kinesis, and Amazon Machine Learning/Deep Learning result in effective data systems for data scientists and business users, alike.
IBM and Lightbend Build Integrated Platform for Cognitive DevelopmentLightbend
By now you have likely heard the news that IBM has made a strategic investment in Lightbend to bring Reactive solutions to IBM Platforms. So, what does this mean for developers?
During this 30-minute conversation with Karl Wehden, Director of Product Management at Lightbend, and Sebastian Hassinger, from the Developer Partners and Ecosystems team at IBM, will explore the following questions:
1. Why did IBM choose to partner with Lightbend, and vice a versa - what intrigued Lightbend about partnering with IBM?
2. Why is Scala important to this vision of the “Cognitive Era”?
3. What types of companies are creating these types of cognitive applications, and what do you see this partnership doing to help them accelerate their efforts?
4. What tools and technologies will we see begin to collaborate first?
5. In which other IBM products and services will we see Lightbend technologies appear as a joint solution?
6. What is the impact on JVM developers, the tools they use and how they get started with these technologies?
Thinking about the full stack to create great mobile experiencesNew Relic
Mobile apps are a critical part of your digital strategy: The app is often the “front door” to your brand for many customers.
Learn how you can measure and ensure optimal mobile experiences for your digital customers with New Relic. We’ll share three of the most commonly missing pieces we see in mobile app development today. And look at a real-world example of how New Relic has helped measure and debug some of the most complex issues that affect app users. Learn more: https://newrelic.com/solutions/digital-customer-experience
Big Data Day LA 2016/ Big Data Track - Rapid Analytics @ Netflix LA (Updated ...Data Con LA
This talk explores how Netflix equips its engineers with the freedom to find and introduce the right software for the job - even if it isn't used anywhere else in-house. Examples include how Netflix has enabled analysts to fluidly switch between MPP RDBMS and an auto-scaling Presto cluster, how Spark + NoSQL stores are used when deploying data sets to internal web apps, and how data scientists are enabled to work in the ML framework of their choosing and deploy models as a service.
Since the term “DevOps” was coined nearly a decade ago, organizations have strived to embrace the concept as a way to increase agility and speed. Yet, after years of experiments and pilots, DevOps has often failed to live up to grand expectations. For many organizations, the seemingly simple concepts of collaboration and transparency are challenging in practice.
In this webinar, Donnie Berkholz, DevOps Research Director at 451 Research, shared what successful DevOps looks like and how new collaboration models and technologies can aid in your efforts to adopt this software development methodology.
View the full webinar here: https://newrelic.com/resources/webinar/DevOps-101-170315
RightScale Webinar: March 5, 2009 – See how pharmaceutical, insurance, and Web 2.0 companies are using grid computing to reduce costs and speed processing. You'll learn how to set up a scalable compute array using RightScale's Grid Edition. Watch the video at http://vimeo.com/rightscale/benefits-of-grid-computing-in-the-cloud.
[AIIM17] It’s Harvest Time in the Information Garden - Dan AntionAIIM International
We’ve been collecting information for many years, driven by the usual suspects: compliance and fear. Now it’s time to take advantage of the information we’ve gathered by shifting our focus from the people who felt they had to keep it to the people who can actually use it. In short, it’s time to reap the benefits of the hard work we have already done. Learn how American Nuclear Insurers is using their information today, the process that got them there, and the technology it took to make it happen.
Learn about the current state of Information Management in AIIM’s latest report: http://info.aiim.org/2017-state-of-information-management
Challenges of Operationalising Data Science in Productioniguazio
The presentation topic for this meet-up was covered in two sections without any breaks in-between
Section 1: Business Aspects (20 mins)
Speaker: Rasmi Mohapatra, Product Owner, Experian
https://www.linkedin.com/in/rasmi-m-428b3a46/
Once your data science application is in the production, there are many typical data science operational challenges experienced today - across business domains - we will cover a few challenges with example scenarios
Section 2: Tech Aspects (40 mins, slides & demo, Q&A )
Speaker: Santanu Dey, Solution Architect, Iguazio
https://www.linkedin.com/in/santanu/
In this part of the talk, we will cover how these operational challenges can be overcome e.g. automating data collection & preparation, making ML models portable & deploying in production, monitoring and scaling, etc.
with relevant demos.
Over 90% of today’s data has been generated in the last two years, and growth rates continue to climb. In this session, we’ll step through challenges and best practices with data capturing, how to derive meaningful insights to help predict the future, and common pitfalls in data analysis.
Come discover how integrated solutions involving Amazon S3, AWS Glue, Amazon Redshift, Amazon Athena, Amazon EMR, Amazon Kinesis, and Amazon Machine Learning/Deep Learning result in effective data systems for data scientists and business users, alike.
IBM and Lightbend Build Integrated Platform for Cognitive DevelopmentLightbend
By now you have likely heard the news that IBM has made a strategic investment in Lightbend to bring Reactive solutions to IBM Platforms. So, what does this mean for developers?
During this 30-minute conversation with Karl Wehden, Director of Product Management at Lightbend, and Sebastian Hassinger, from the Developer Partners and Ecosystems team at IBM, will explore the following questions:
1. Why did IBM choose to partner with Lightbend, and vice a versa - what intrigued Lightbend about partnering with IBM?
2. Why is Scala important to this vision of the “Cognitive Era”?
3. What types of companies are creating these types of cognitive applications, and what do you see this partnership doing to help them accelerate their efforts?
4. What tools and technologies will we see begin to collaborate first?
5. In which other IBM products and services will we see Lightbend technologies appear as a joint solution?
6. What is the impact on JVM developers, the tools they use and how they get started with these technologies?
Thinking about the full stack to create great mobile experiencesNew Relic
Mobile apps are a critical part of your digital strategy: The app is often the “front door” to your brand for many customers.
Learn how you can measure and ensure optimal mobile experiences for your digital customers with New Relic. We’ll share three of the most commonly missing pieces we see in mobile app development today. And look at a real-world example of how New Relic has helped measure and debug some of the most complex issues that affect app users. Learn more: https://newrelic.com/solutions/digital-customer-experience
Big Data Day LA 2016/ Big Data Track - Rapid Analytics @ Netflix LA (Updated ...Data Con LA
This talk explores how Netflix equips its engineers with the freedom to find and introduce the right software for the job - even if it isn't used anywhere else in-house. Examples include how Netflix has enabled analysts to fluidly switch between MPP RDBMS and an auto-scaling Presto cluster, how Spark + NoSQL stores are used when deploying data sets to internal web apps, and how data scientists are enabled to work in the ML framework of their choosing and deploy models as a service.
Since the term “DevOps” was coined nearly a decade ago, organizations have strived to embrace the concept as a way to increase agility and speed. Yet, after years of experiments and pilots, DevOps has often failed to live up to grand expectations. For many organizations, the seemingly simple concepts of collaboration and transparency are challenging in practice.
In this webinar, Donnie Berkholz, DevOps Research Director at 451 Research, shared what successful DevOps looks like and how new collaboration models and technologies can aid in your efforts to adopt this software development methodology.
View the full webinar here: https://newrelic.com/resources/webinar/DevOps-101-170315
RightScale Webinar: March 5, 2009 – See how pharmaceutical, insurance, and Web 2.0 companies are using grid computing to reduce costs and speed processing. You'll learn how to set up a scalable compute array using RightScale's Grid Edition. Watch the video at http://vimeo.com/rightscale/benefits-of-grid-computing-in-the-cloud.
[AIIM17] It’s Harvest Time in the Information Garden - Dan AntionAIIM International
We’ve been collecting information for many years, driven by the usual suspects: compliance and fear. Now it’s time to take advantage of the information we’ve gathered by shifting our focus from the people who felt they had to keep it to the people who can actually use it. In short, it’s time to reap the benefits of the hard work we have already done. Learn how American Nuclear Insurers is using their information today, the process that got them there, and the technology it took to make it happen.
Learn about the current state of Information Management in AIIM’s latest report: http://info.aiim.org/2017-state-of-information-management
Challenges of Operationalising Data Science in Productioniguazio
The presentation topic for this meet-up was covered in two sections without any breaks in-between
Section 1: Business Aspects (20 mins)
Speaker: Rasmi Mohapatra, Product Owner, Experian
https://www.linkedin.com/in/rasmi-m-428b3a46/
Once your data science application is in the production, there are many typical data science operational challenges experienced today - across business domains - we will cover a few challenges with example scenarios
Section 2: Tech Aspects (40 mins, slides & demo, Q&A )
Speaker: Santanu Dey, Solution Architect, Iguazio
https://www.linkedin.com/in/santanu/
In this part of the talk, we will cover how these operational challenges can be overcome e.g. automating data collection & preparation, making ML models portable & deploying in production, monitoring and scaling, etc.
with relevant demos.
The key to the cognitive business is putting data to work. What is needed is a platform, an ecosystem, and a method.
Learn more about http://ibm.co/dataworks
Bridging the Gap: Analyzing Data in and Below the CloudInside Analysis
The Briefing Room with Dean Abbott and Tableau Software
Live Webcast July 23, 2013
http://www.insideanalysis.com
Today’s desire for analytics extends well beyond the traditional domain of Business Intelligence. That’s partly because business users are realizing the value of mixing and matching all kinds of data, from all kinds of sources. One emerging market driver is Cloud-based data, and the desire companies have to analyze this data cohesively with their on-premise data sets.
Register for this episode of The Briefing Room to learn from Analyst Dean Abbott, who will explain how the ability to access data in the cloud can play a critical role for generating business value from analytics. He’ll be briefed by Ellie Fields of Tableau Software who will tout Tableau’s latest release, which includes native connectors to cloud-based applications like Salesforce.com, Amazon Redshift, Google Analytics and BigQuery. She’ll also demonstrate how Tableau can combine cloud data with other data sources, including spreadsheets, databases, cubes and even Big Data.
Slides of a talk given to the Seattle Chapter of the Cloud Security Alliance. Looks briefly at Architectures, Sources of Log Data, and behavioral signatures in the data and issues and observations around using Big Data products for security.
Power to the People: A Stack to Empower Every User to Make Data-Driven DecisionsLooker
Infectious Media runs on data. But, as an ad-tech company that records hundreds of thousands of web events per second, they have have to deal with data at a scale not seen by most companies. You can not make decisions with data when people need to write manual SQL only for queries take 10-20 minutes to return. Infectious Media made the switch to Google BigQuery and Looker and now every member of every team can get the data they need in seconds.
Infectious Media shares:
- Why they chose their current stack
- Why faster data means happier customers
- Advantages and practical implications of storing and processing that much data
Check out the recording at https://info.looker.com/h/i/308848878-power-to-the-people-a-stack-to-empower-every-user-to-make-data-driven-decisions
Maximizing Oil and Gas (Data) Asset Utilization with a Logical Data Fabric (A...Denodo
Watch full webinar here: https://bit.ly/3g9PlQP
It is no news that Oil and Gas companies are constantly faced with immense pressure to stay competitive, especially in the current climate while striving towards becoming data-driven at the heart of the process to scale and gain greater operational efficiencies across the organization.
Hence, the need for a logical data layer to help Oil and Gas businesses move towards a unified secure and governed environment to optimize the potential of data assets across the enterprise efficiently and deliver real-time insights.
Tune in to this on-demand webinar where you will:
- Discover the role of data fabrics and Industry 4.0 in enabling smart fields
- Understand how to connect data assets and the associated value chain to high impact domain areas
- See examples of organizations accelerating time-to-value and reducing NPT
- Learn best practices for handling real-time/streaming/IoT data for analytical and operational use cases
A Key to Real-time Insights in a Post-COVID World (ASEAN)Denodo
Watch full webinar here: https://bit.ly/2EpHGyd
Presented at Data Champions, Online Asia 2020
Businesses and individuals around the world are experiencing the impact of a global pandemic. With many workers and potential shoppers still sequestered, COVID-19 is proving to have a momentous impact on the global economy. Regardless of the current situation and post-pandemic era, real-time data becomes even more critical to healthcare practitioners, business owners, government officials, and the public at large where holistic and timely information are important to make quick decisions. It enables doctors to make quick decisions about where to focus the care, business owners to alter production schedules to meet the demand, government agencies to contain the epidemic, and the public to be informed about prevention.
In this on-demand session, you will learn about the capabilities of data virtualization as a modern data integration technique and how can organisations:
- Rapidly unify information from disparate data sources to make accurate decisions and analyse data in real-time
- Build a single engine for security that provides audit and control by geographies
- Accelerate delivery of insights from your advanced analytics project
RightScale Roadtrip - Accelerate to CloudRightScale
The Accelerate to Cloud keynote will help you understand the current state of cloud adoption, identify the business value for your organization, and provide you a framework to plot your course to cloud adoption.
How Celtra Optimizes its Advertising Platformwith DatabricksGrega Kespret
Leading brands such as Pepsi and Macy’s use Celtra’s technology platform for brand advertising. To inform better product design and resolve issues faster, Celtra relies on Databricks to gather insights from large-scale, diverse, and complex raw event data. Learn how Celtra uses Databricks to simplify their Spark deployment, achieve faster project turnaround time, and empower people to make data-driven decisions.
In this webinar, you will learn how Databricks helps Celtra to:
- Utilize Apache Spark to power their production analytics pipeline.
- Build a “Just-in-Time” data warehouse to analyze diverse data sources such as Elastic Load Balancer access logs, raw tracking events, operational data, and reportable metrics.
- Go beyond simple counting and group events into sequences (i.e., sessionization) and perform more complex analysis such as funnel analytics.
Curiosity Software and RCG Global Services Present - Solving Test Data: the g...Curiosity Software Ireland
This webinar was co-hosted by Curiosity and RCG Global Services on January 20th, 2022. Watch the webinar on demand: https://www.curiositysoftware.ie/solving-test-data-webinar
Outdated test data management practices are today a sinkhole for testing and development time. They stifle release velocity, risk costly legislative non-compliance, and yet still do not provide the data needed to protect releases from damaging bugs. To achieve true quality at speed, the test data paradigm must shift. Enterprises must move beyond slowly copying large sets of production data to a limited number of out-of-date test environments.
In this webinar, Global Head of Quality Engineering at RCG, Niko Mangahas, draws on extensive project experience to define the test data challenges facing enterprises today. He then helps you identify the right test data solution for your organisation, setting out principles for effective requirements gathering and program design. Niko then hands over to veteran test data inventor, Huw Price, who demoes some of the latest techniques for making complete and compliant data available on-the-fly during parallel testing, development, and CI/CD.
How to Design, Build and Map IT and Business Services in Splunk Splunk
Your IT department supports critical business functions, processes and products. You're most effective when your technology initiatives are closely aligned and measured with specific business objectives. This session covers best practices and techniques for designing and building an effective service model, using the domain knowledge of your experts and capturing and reporting on key metrics that everyone can understand. We will design a sample service model and map them to performance indicators to track operational and business objectives. We will also show you how to make Splunk service-ware with Splunk IT Service Intelligence (ITSI).
The Biggest Mistake you can make with your Data Center LicensesIvanti
IT is spending more on software than ever before. This most likely leaves you looking for ways to make the most of the software licenses you already have. On top of that, increasingly complex data center environments compound spending on software assets. Limited visibility into those assets could expose your most valuable business-critical applications to significant risk and additional cost.
See how data center discovery ninja Matt Reardon, takes an aggressive approach to gain visibility on complex software licenses so you can start making the most of your IT investments.
Similar to PXL Data Engineering Workshop By Selligent (20)
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
How world-class product teams are winning in the AI era by CEO and Founder, P...
PXL Data Engineering Workshop By Selligent
1. Data
Engineering
& Data
Science
• Please install/update to the latest zoom
app
• Otherwise you cannot join our
breakout rooms
• https://zoom.us/download
• Start at 09:05 🕘
WELCOME!
3. LAURENS VIJNCK - DATA SHEET
• Data Engineer/Data Scientist
• 1 year at Selligent
• Master thesis on Streaming Analytics
• Interests: Streaming, Distributed Computing
4. JONNY DAENEN - DATA SHEET
• Data Engineer/Scientist
• 4 years at Selligent
• PhD Computer Science
• Focus = Data, Cloud
5. TAKEAWAY
• Cloud Technology alleviates operational burden
• Devops is a state of mind
• Road to production: Devil is in the details
6. AGENDA
• Intro
• Selligent & Data
• Use Case 1
• Real-time Data Analysis
• Big Data Tech in Google Cloud
• Data Engineering @ Selligent
• Roles & Tools
• Use Case 2
• Visual Data Exploration
• Reports in Data Studio
• Short presentation
31. STORAGE - BIGQUERY
• Columnar Storage
• No ops (serverless)
• Pay for storage
• Pay per byte queried (columns & time touched)
• Data Market
BigQuery
32. INGEST - PUB/SUB
• Event enters system
• Event is sent to Pub/Sub
• No ops (serverless)
• Globally available
• Pay as you go
• 7 day retention
• No ordering (alpha feature)
• No server-side filtering
Pub/Sub
33. PROCESSING - DATAFLOW
• Aggregation of events per consumer per tenant
• Dataflow
• Managed (choose your machines)
• Serverless
• Auto-scaling
• In-flight pipeline updates
• Monitoring
• Exactly-once
• Batch and strEAMing (Apache Beam)
• SQL available
• Documentation Unclear
DataFlow
37. DATA EXPLORATION
• How many teams and members are present?
• How large is the audience?
• When did every member become active?
• What is the activity timespan of every team?
• Which minute of the hour received the most clicks?
48. DATA ENGINEER
• Fault-tolerance
• What if pipeline fails?
• Streaming means: Re-execute, Re-execute, Re-execute
• Bundle
• Out of order processing of successive windows
• Can you deal with it?
• Depends on use case
• Exactly-once?
• Use native dataflow/beam operators
49. THE INTEGRATION WIZARD
• "Legacy application"
• Changes needed
• Release cycle of 6 weeks
• Alignment with other teams
51. THE BUTCHER
• Unit tests
• Dataflow test framework
• Integration test
• Between services
• External components
• Mocking?
• Performance test
• Does it scale
• Multi-tenancy
52. THE AUTOMATION KING
• Infrastructure as Code
• Terraform
• Testing & Deployment (CICD)
• CircleCi
59. THE MANAGER
• Onboarding & Clients
• How to create business value?
• How to measure success?
• Who does activations?
• Do we need initial data loads?
• Who triggers it?
• What documents need to be signed?
• What do clients expect?
60. THE DATA SCIENTIST
• Analysis
• Potential improvements
• Client feedback
• AI Notebooks
• JupyterLab
• One-click start
61. TEAM VALUES
Everything as code
• Traceable
• Reproducable
• Explicit
Cloud/Serverless
• Less management
• Devops becomes easier
• Pay as you go
Automation
• Less ops work
• Reliable releases
• Continuous delivery
67. BUSINESS QUESTIONS
• What hour of the day are most people active?
• Which users are active before 10am?
• What is the most clicks people have done in 1 hour?
• How many clicks does an average user do in 1 week?
• How is the channel usage distributed?
• ...?
68. VISUALIZATIONS
• Heatmap of user activity in a day
• Heatmap of activity per channel
• Year over year comparison of activity per month
• For a given user, the activity timeline
• For a given user, the average number of clicks per week