Model serving made easy using Kedro pipelines - Mariusz Strzelecki, GetInDataGetInData
If you want to stay up to date, subscribe to our newsletter here: https://bit.ly/3tiw1I8
Presentation from the performance given by Mariusz during the Data Science Summit ML Edition.
Author: Mariusz Strzelecki
Linkedin: https://www.linkedin.com/in/mariusz-strzelecki/
___
Company:
Getindata is a company founded in 2014 by ex-Spotify data engineers. From day one our focus has been on Big Data projects. We bring together a group of best and most experienced experts in Poland, working with cloud and open-source Big Data technologies to help companies build scalable data architectures and implement advanced analytics over large data sets.
Our experts have vast production experience in implementing Big Data projects for Polish as well as foreign companies including i.a. Spotify, Play, Truecaller, Kcell, Acast, Allegro, ING, Agora, Synerise, StepStone, iZettle and many others from the pharmaceutical, media, finance and FMCG industries.
https://getindata.com
Leveraging IoT as part of your digital transformationJohn Archer
Review of approaches for Edge computing architecture with emphasis on improved security for container workloads collecting telemetry from Industrial IoT environments
Elephants in the cloud or how to become cloud readyKrzysztof Adamski
How to approach moving your big data environment into the public cloud based. Lessons learned from other companies. Examples based on Google Cloud offering.
Single View of Well, Production and AssetsJohn Archer
SINGLE VIEW OF WELL, PRODUCTION AND ASSETS
Deliver a complete view of G&G, Well Header, Volumes, transactional data
Reduce Data Movement
Reduce Load on Data sources with intelligent caching
Aggregated single view of complex and legacy data sources
Delivering Agile Data Science on Openshift - Red Hat Summit 2019John Archer
Audrey Reznik, Data Scientist from ExxonMobil and John Archer, Red Hat Solution Architect present on how to use Openshift to enable and create value to data science teams and improve their agility and improve collaboration for larger organizations.
The Pandemic Changes Everything, the Need for Speed and ResiliencyAlluxio, Inc.
This document discusses how the COVID-19 pandemic has accelerated the need for cloud computing and digital transformation. Some key points:
- By 2021, over 90% of organizations will rely on a mix of on-premises, private clouds, public clouds, and legacy systems to meet infrastructure needs.
- By 2023, an emerging cloud ecosystem for extending resource control and analytics will underlie all IT and business automation initiatives anywhere.
- Resilient business models and superior customer experience will be critical as organizations shift more operations and services to the cloud.
Kubernetes and real-time analytics - how to connect these two worlds with Apa...GetInData
Did you like it? Check out our blog to stay up to date: https://getindata.com/blog
More and more services are running in Kubernetes so it means that we can migrate our current data pipelines to the new environment. In case of Flink we have multiple ways to do real-time data streaming: use Lyft or GCP operator, go with official deployment and customize it or choose the Ververica Platform or create something on your own. The presentation shows how to choose the right solution for technical requirements and business needs to run Flink in Kubernetes at great scale with no issues.
Author: Albert Lewandowski
Linkedin: https://www.linkedin.com/in/albert-lewandowski/
___
Getindata is a company founded in 2014 by ex-Spotify data engineers. From day one our focus has been on Big Data projects. We bring together a group of best and most experienced experts in Poland, working with cloud and open-source Big Data technologies to help companies build scalable data architectures and implement advanced analytics over large data sets.
Our experts have vast production experience in implementing Big Data projects for Polish as well as foreign companies including i.a. Spotify, Play, Truecaller, Kcell, Acast, Allegro, ING, Agora, Synerise, StepStone, iZettle and many others from the pharmaceutical, media, finance and FMCG industries.
https://getindata.com
Model serving made easy using Kedro pipelines - Mariusz Strzelecki, GetInDataGetInData
If you want to stay up to date, subscribe to our newsletter here: https://bit.ly/3tiw1I8
Presentation from the performance given by Mariusz during the Data Science Summit ML Edition.
Author: Mariusz Strzelecki
Linkedin: https://www.linkedin.com/in/mariusz-strzelecki/
___
Company:
Getindata is a company founded in 2014 by ex-Spotify data engineers. From day one our focus has been on Big Data projects. We bring together a group of best and most experienced experts in Poland, working with cloud and open-source Big Data technologies to help companies build scalable data architectures and implement advanced analytics over large data sets.
Our experts have vast production experience in implementing Big Data projects for Polish as well as foreign companies including i.a. Spotify, Play, Truecaller, Kcell, Acast, Allegro, ING, Agora, Synerise, StepStone, iZettle and many others from the pharmaceutical, media, finance and FMCG industries.
https://getindata.com
Leveraging IoT as part of your digital transformationJohn Archer
Review of approaches for Edge computing architecture with emphasis on improved security for container workloads collecting telemetry from Industrial IoT environments
Elephants in the cloud or how to become cloud readyKrzysztof Adamski
How to approach moving your big data environment into the public cloud based. Lessons learned from other companies. Examples based on Google Cloud offering.
Single View of Well, Production and AssetsJohn Archer
SINGLE VIEW OF WELL, PRODUCTION AND ASSETS
Deliver a complete view of G&G, Well Header, Volumes, transactional data
Reduce Data Movement
Reduce Load on Data sources with intelligent caching
Aggregated single view of complex and legacy data sources
Delivering Agile Data Science on Openshift - Red Hat Summit 2019John Archer
Audrey Reznik, Data Scientist from ExxonMobil and John Archer, Red Hat Solution Architect present on how to use Openshift to enable and create value to data science teams and improve their agility and improve collaboration for larger organizations.
The Pandemic Changes Everything, the Need for Speed and ResiliencyAlluxio, Inc.
This document discusses how the COVID-19 pandemic has accelerated the need for cloud computing and digital transformation. Some key points:
- By 2021, over 90% of organizations will rely on a mix of on-premises, private clouds, public clouds, and legacy systems to meet infrastructure needs.
- By 2023, an emerging cloud ecosystem for extending resource control and analytics will underlie all IT and business automation initiatives anywhere.
- Resilient business models and superior customer experience will be critical as organizations shift more operations and services to the cloud.
Kubernetes and real-time analytics - how to connect these two worlds with Apa...GetInData
Did you like it? Check out our blog to stay up to date: https://getindata.com/blog
More and more services are running in Kubernetes so it means that we can migrate our current data pipelines to the new environment. In case of Flink we have multiple ways to do real-time data streaming: use Lyft or GCP operator, go with official deployment and customize it or choose the Ververica Platform or create something on your own. The presentation shows how to choose the right solution for technical requirements and business needs to run Flink in Kubernetes at great scale with no issues.
Author: Albert Lewandowski
Linkedin: https://www.linkedin.com/in/albert-lewandowski/
___
Getindata is a company founded in 2014 by ex-Spotify data engineers. From day one our focus has been on Big Data projects. We bring together a group of best and most experienced experts in Poland, working with cloud and open-source Big Data technologies to help companies build scalable data architectures and implement advanced analytics over large data sets.
Our experts have vast production experience in implementing Big Data projects for Polish as well as foreign companies including i.a. Spotify, Play, Truecaller, Kcell, Acast, Allegro, ING, Agora, Synerise, StepStone, iZettle and many others from the pharmaceutical, media, finance and FMCG industries.
https://getindata.com
Managing Big Data projects in a constantly changing environment - Rafał Zalew...GetInData
Watch our full performance given by our team during the Big Data Technology Warsaw Summit: https://www.youtube.com/watch?v=CBrq7z8ikaM
The nature of Big Data projects are nowadays one of its kind - they are not like the data warehousing initiatives in the old days, nor like cloud native applications projects, at least not yet. Variety of technologies, complicated architectures and rapidly changing landscape are just a few challenges that the IT Department is facing in such projects. When you add the number of stakeholders from different departments involved and that Big Data project is sometimes more like an R&D with unpredictable outcome, this makes a mix where the objectives can be easily lost. It is not a surprise that up to 85% of Big Data projects were pure failures (Gartner 2016).
In this talk we will share our experience in planning and executing Big Data initiatives in the organisations, with some use cases and good practices in mind
Watch our webinar here: https://www.youtube.com/watch?v=CBrq7z8ikaM
Speakers:
Rafał Małanij
Rafał Zalewski
Linkedin: https://www.linkedin.com/in/rafalzalewski/
___
Company:
Getindata is a company founded in 2014 by ex-Spotify data engineers. From day one our focus has been on Big Data projects. We bring together a group of best and most experienced experts in Poland, working with cloud and open-source Big Data technologies to help companies build scalable data architectures and implement advanced analytics over large data sets.
Our experts have vast production experience in implementing Big Data projects for Polish as well as foreign companies including i.a. Spotify, Play, Truecaller, Kcell, Acast, Allegro, ING, Agora, Synerise, StepStone, iZettle and many others from the pharmaceutical, media, finance and FMCG industries.
https://getindata.com
End-to-End Big Data AI with Analytics ZooJason Dai
The document discusses Analytics Zoo, an open-source software platform for building end-to-end big data AI applications. It provides distributed deep learning frameworks like TensorFlow and PyTorch on Apache Spark. Analytics Zoo allows seamless scaling of AI models from laptop to distributed big data and includes features like automated machine learning, time series forecasting, and serving models in production. It aims to simplify development of end-to-end big data AI solutions.
CWIN17 london becoming cloud native part 2 - guy martin dockerCapgemini
This document discusses how organizations can become cloud native by embracing the full opportunity from cloud. It identifies six key steps: 1) delivering business visible and impactful benefits, 2) technical solutions that deliver the business case, 3) empowering a dedicated cloud services team, 4) creating a cloud service vending machine, 5) establishing a blueprint for integrating cloud into existing IT, and 6) implementing automated application and infrastructure pipelines. It then discusses how Docker can help organizations modernize traditional applications and build a secure software supply chain through containerization.
Greenplum for Kubernetes - Greenplum Summit 2019VMware Tanzu
The document discusses Greenplum for Kubernetes, which allows Greenplum databases to be deployed on Kubernetes. It can be deployed on public clouds, private clouds, or bare metal. Greenplum is packaged as containers for portability and managed by Kubernetes for high availability and elasticity. Benefits include speed of deployment, savings from using existing Kubernetes skills and hardware, security, stability, and scalability. Use cases include agile analytics, workbenches with curated tool stacks, and automatic data platforms with day-2 operations automation.
The document summarizes key topics from the Cloud Native Summit conference, including:
- Distributed tracing and Zipkin, which allows visibility into request paths and troubleshooting of latency issues. Zipkin is an open source distributed tracing system.
- Production ready Kubernetes clusters on Catalyst Cloud, which provides security, high availability, and scalability for containerized applications.
- Building serverless applications at scale using services like AWS Lambda, and addressing concurrency bottlenecks when autoscaling.
- Istio service mesh, which provides control of traffic policies, authentication, and observability across distributed services through its control plane and sidecar proxy architecture.
- GitOps for infrastructure as code deployments on Open
Open Source Edge Computing Platforms - OverviewKrishna-Kumar
IEEE 11th International Conference - COMSNETS 2019 - Last MilesTalk - Jan 2019. This talk is for Beginner or intermediate levels only. Kubernetes and related edge platforms are discussed.
Monitoring environment based on satellite data with Python and PySpark - Albe...GetInData
Did you like it? Check out our blog to stay up to date: https://getindata.com/blog
Satellite data is not well-known although it provides a lot useful information for different sectors like agriculture, industry or military.During the workshops participants will learn how to get the satellite imageries, process them by using Python and PySpark and what are the most common use cases. There will be a knowledge sharing session about issues with satellite data and how these difficulties can be overcome.
Author: Albert Lewandowski
Linkedin: https://www.linkedin.com/in/albert-lewandowski/
_____
Getindata is a company founded in 2014 by ex-Spotify data engineers. From day one our focus has been on Big Data projects. We bring together a group of best and most experienced experts in Poland, working with cloud and open-source Big Data technologies to help companies build scalable data architectures and implement advanced analytics over large data sets.
Our experts have vast production experience in implementing Big Data projects for Polish as well as foreign companies including i.a. Spotify, Play, Truecaller, Kcell, Acast, Allegro, ING, Agora, Synerise, StepStone, iZettle and many others from the pharmaceutical, media, finance and FMCG industries.
https://getindata.com
Natalie Godec - AirFlow and GCP: tomorrow's health service data platformmatteo mazzeri
At Babylon Health, we are on a mission to put accessible and affordable healthcare in the hands of every person on Earth. You might imagine that such an endeavour would generate incredible amounts of data! And, since AI is in the core of our product, leveraging data from our microservices and clients is crucial to our success. So we set off to build a data platform of the future, based both in AWS and GCP, leveraging our existing infrastructure and CICD and building the missing parts.
Leveraging Spark to Democratize Data for Omni-Commerce with Shafaq AbdullahDatabricks
Insnap, a hyper-personalized ML-based platform acquired by The Honest Company, has been used to build a real-time data platform based on Apache Spark, Cassandra and Redshift. Users’ behavioral and transactional data have been used to build data models and ML models, and to drive use cases for marketing, growth, finance and operations.
Learn how Honest Company has used Spark as a workhorse for 1) collecting, ETL and storing data from various sources including mysql, mongo, jde, Google analytics, Facebook, Localytics and REST API; 2) building data models and aggregating and generating reports of revenue, order fulfillment tracking, data pipeline monitoring and subscriptions; 3) Using ML to build model for user acquisitions, LTV and recommendations use cases. Spark replaced the monolithic codebase with flexible, scalable and robust pipelines. Databricks helped The Honest Company to focus on data instead of maintaining infrastructure. While Honest users got delightful recommendations to improve experience, data users at Honest understood users much better in terms of segmenting with behavioral information and advanced ML models, leading to increased revenue and retention.
AnsibleFest 2020 - Automate cybersecurity solutions in a cloud native scenarioRoberto Carratala
Roberto Carratalá and Diego Escobar will present on automating cybersecurity solutions in a cloud native scenario using Red Hat Ansible Tower. The presentation will cover 5 labs demonstrating how to provision Tower, deploy an Azure environment, automatically configure Checkpoint security management and gateways, deploy applications with cybersecurity rules, and deploy NAT and firewall access rules. Red Hat experts Adrienne, Leonardo, Asier, and German will assist during the presentation. Access details and passwords to the lab environments are provided.
FutureGrid Computing Testbed as a ServiceGeoffrey Fox
Describes FutureGrid and its role as a Computing Testbed as a Service. FutureGrid is user-customizable, accessed interactively and supports Grid, Cloud and HPC software with and without VM’s. Lessons learnt and example use cases are described
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...GetInData
Did you like it? Check out our E-book: Apache NiFi - A Complete Guide
https://ebook.getindata.com/apache-nifi-complete-guide
Apache NiFi is one of the most popular services for running ETL pipelines otherwise it’s not the youngest technology. During the talk, there are described all details about migrating pipelines from the old Hadoop platform to the Kubernetes, managing everything as the code, monitoring all corner cases of NiFi and making it a robust solution that is user-friendly even for non-programmers.
Author: Albert Lewandowski
Linkedin: https://www.linkedin.com/in/albert-lewandowski/
___
Getindata is a company founded in 2014 by ex-Spotify data engineers. From day one our focus has been on Big Data projects. We bring together a group of best and most experienced experts in Poland, working with cloud and open-source Big Data technologies to help companies build scalable data architectures and implement advanced analytics over large data sets.
Our experts have vast production experience in implementing Big Data projects for Polish as well as foreign companies including i.a. Spotify, Play, Truecaller, Kcell, Acast, Allegro, ING, Agora, Synerise, StepStone, iZettle and many others from the pharmaceutical, media, finance and FMCG industries.
https://getindata.com
Lambda Data Grid: An Agile Optical Platform for Grid Computing and Data-inten...Tal Lavian Ph.D.
Lambda Data Grid
An Agile Optical Platform for Grid Computing
and Data-intensive Applications
Focus on BIRN Mouse application.
Great vision –
LambdaGrid is one step towards this concepts
LambdaGrid –
A novel service architecture
Lambda as a Scheduled Service
Lambda as a prime resource - like storage and computation
Change our current systems assumptions
Potentially opens new horizon
Webinar: Deep Learning Pipelines Beyond the LearningMesosphere Inc.
Mesosphere technical lead Joerg Schad looks at the complete deep learning pipeline. In these slides, Joerg addresses commonly asked questions, such as:
1. How can we easily deploy distributed deep learning frameworks on any public or private infrastructure?
2. How can we manage different deep learning frameworks on a single cluster, especially considering heterogeneous resources such as GPUs?
3. What is the best UI for a data scientist to work with the cluster?
4. How can we store & serve models at scale?
5. How can we update models that are currently in use without causing downtime for the service using them?
6. How can we monitor the entire pipeline and track performance of the deployed models?
Edge optimized architecture for fabric defect detection in real-timeShuquan Huang
In textile industry, fabric defect relies on human inspection traditionally, which is inaccurate, inconsistent, inefficient and expensive. There were automatic systems developed on the defect detection by identifying the faults in fabric surface using the image and video processing techniques. However, the existing solution has insufficiencies in defect data sharing, backhaul interconnect, maintenance and etc. By evolving to an edge-optimized architecture, we can help textile industry improve fabric quality, reduce operation cost and increase production efficiency. In this session, I’ll share:
What’s edge computing and why it’s important to intelligence manufacturing
What’s the characteristics, strengths and weaknesses of traditional fabric defect detection method
Why textile industry can benefit from edge computing infrastructure
How to design and implement an edge-enabled application for fabric defect detection in real-time
Insights, synergy and future research directions
GOAI: GPU-Accelerated Data Science DataSciCon 2017Joshua Patterson
The GPU Open Analytics Initiative, GOAI, is accelerating data science like never before. CPUs are not improving at the same rate as networking and storage, and leveraging GPUs data scientist can analyze more data than ever with less hardware. Learn more about how GPU are accelerating data science (not just Deep Learning), and how to get started.
Project Onboarding gives attendees a chance to meet some of the project team and get to know the project. Attendees will learn about the project itself, the code structure/ overall architecture, etc, and places where contribution is needed. Attendees will also get to know some of the core contributors and other established community members.
This document discusses accelerating cyber threat detection with GPUs. It begins by noting that current detection methods are too slow, taking an average of 98 days for financial services and up to 7 months for retailers. It then discusses how attacks are becoming more sophisticated and provides examples. The document outlines principles for cybersecurity, including improving indication of compromise through combining machine learning, graph analysis, and other methods. It discusses building an anomaly detection platform using deep learning and GPUs for improved performance. It also covers using GPU databases and visualization to further accelerate analytics and hunting of threats.
SnappyData is a new open source project started by Pivotal GemFire founders to provide a unified platform for OLTP, OLAP and streaming analytics using Spark. It aims to simplify big data architectures by supporting mixed workloads in a single clustered database, allowing for real-time operational analytics on live data without copying to other systems. This provides faster insights than current approaches that require periodic data copying between different databases and analytics systems.
Real-Time Analytics with Confluent and MemSQLSingleStore
This document discusses enabling real-time analytics for IoT applications. It describes how industries like auto, transportation, energy, warehousing and logistics, and healthcare need real-time analytics to handle streaming data from IoT sensors. It also discusses how Confluent's Kafka stream processing platform can be used to build applications that ingest IoT data at high speeds, transform the data, and power real-time analytics and user interfaces. MemSQL's in-memory database is presented as a fast and scalable storage option to support real-time analytics on the large volumes of IoT data.
Managing Big Data projects in a constantly changing environment - Rafał Zalew...GetInData
Watch our full performance given by our team during the Big Data Technology Warsaw Summit: https://www.youtube.com/watch?v=CBrq7z8ikaM
The nature of Big Data projects are nowadays one of its kind - they are not like the data warehousing initiatives in the old days, nor like cloud native applications projects, at least not yet. Variety of technologies, complicated architectures and rapidly changing landscape are just a few challenges that the IT Department is facing in such projects. When you add the number of stakeholders from different departments involved and that Big Data project is sometimes more like an R&D with unpredictable outcome, this makes a mix where the objectives can be easily lost. It is not a surprise that up to 85% of Big Data projects were pure failures (Gartner 2016).
In this talk we will share our experience in planning and executing Big Data initiatives in the organisations, with some use cases and good practices in mind
Watch our webinar here: https://www.youtube.com/watch?v=CBrq7z8ikaM
Speakers:
Rafał Małanij
Rafał Zalewski
Linkedin: https://www.linkedin.com/in/rafalzalewski/
___
Company:
Getindata is a company founded in 2014 by ex-Spotify data engineers. From day one our focus has been on Big Data projects. We bring together a group of best and most experienced experts in Poland, working with cloud and open-source Big Data technologies to help companies build scalable data architectures and implement advanced analytics over large data sets.
Our experts have vast production experience in implementing Big Data projects for Polish as well as foreign companies including i.a. Spotify, Play, Truecaller, Kcell, Acast, Allegro, ING, Agora, Synerise, StepStone, iZettle and many others from the pharmaceutical, media, finance and FMCG industries.
https://getindata.com
End-to-End Big Data AI with Analytics ZooJason Dai
The document discusses Analytics Zoo, an open-source software platform for building end-to-end big data AI applications. It provides distributed deep learning frameworks like TensorFlow and PyTorch on Apache Spark. Analytics Zoo allows seamless scaling of AI models from laptop to distributed big data and includes features like automated machine learning, time series forecasting, and serving models in production. It aims to simplify development of end-to-end big data AI solutions.
CWIN17 london becoming cloud native part 2 - guy martin dockerCapgemini
This document discusses how organizations can become cloud native by embracing the full opportunity from cloud. It identifies six key steps: 1) delivering business visible and impactful benefits, 2) technical solutions that deliver the business case, 3) empowering a dedicated cloud services team, 4) creating a cloud service vending machine, 5) establishing a blueprint for integrating cloud into existing IT, and 6) implementing automated application and infrastructure pipelines. It then discusses how Docker can help organizations modernize traditional applications and build a secure software supply chain through containerization.
Greenplum for Kubernetes - Greenplum Summit 2019VMware Tanzu
The document discusses Greenplum for Kubernetes, which allows Greenplum databases to be deployed on Kubernetes. It can be deployed on public clouds, private clouds, or bare metal. Greenplum is packaged as containers for portability and managed by Kubernetes for high availability and elasticity. Benefits include speed of deployment, savings from using existing Kubernetes skills and hardware, security, stability, and scalability. Use cases include agile analytics, workbenches with curated tool stacks, and automatic data platforms with day-2 operations automation.
The document summarizes key topics from the Cloud Native Summit conference, including:
- Distributed tracing and Zipkin, which allows visibility into request paths and troubleshooting of latency issues. Zipkin is an open source distributed tracing system.
- Production ready Kubernetes clusters on Catalyst Cloud, which provides security, high availability, and scalability for containerized applications.
- Building serverless applications at scale using services like AWS Lambda, and addressing concurrency bottlenecks when autoscaling.
- Istio service mesh, which provides control of traffic policies, authentication, and observability across distributed services through its control plane and sidecar proxy architecture.
- GitOps for infrastructure as code deployments on Open
Open Source Edge Computing Platforms - OverviewKrishna-Kumar
IEEE 11th International Conference - COMSNETS 2019 - Last MilesTalk - Jan 2019. This talk is for Beginner or intermediate levels only. Kubernetes and related edge platforms are discussed.
Monitoring environment based on satellite data with Python and PySpark - Albe...GetInData
Did you like it? Check out our blog to stay up to date: https://getindata.com/blog
Satellite data is not well-known although it provides a lot useful information for different sectors like agriculture, industry or military.During the workshops participants will learn how to get the satellite imageries, process them by using Python and PySpark and what are the most common use cases. There will be a knowledge sharing session about issues with satellite data and how these difficulties can be overcome.
Author: Albert Lewandowski
Linkedin: https://www.linkedin.com/in/albert-lewandowski/
_____
Getindata is a company founded in 2014 by ex-Spotify data engineers. From day one our focus has been on Big Data projects. We bring together a group of best and most experienced experts in Poland, working with cloud and open-source Big Data technologies to help companies build scalable data architectures and implement advanced analytics over large data sets.
Our experts have vast production experience in implementing Big Data projects for Polish as well as foreign companies including i.a. Spotify, Play, Truecaller, Kcell, Acast, Allegro, ING, Agora, Synerise, StepStone, iZettle and many others from the pharmaceutical, media, finance and FMCG industries.
https://getindata.com
Natalie Godec - AirFlow and GCP: tomorrow's health service data platformmatteo mazzeri
At Babylon Health, we are on a mission to put accessible and affordable healthcare in the hands of every person on Earth. You might imagine that such an endeavour would generate incredible amounts of data! And, since AI is in the core of our product, leveraging data from our microservices and clients is crucial to our success. So we set off to build a data platform of the future, based both in AWS and GCP, leveraging our existing infrastructure and CICD and building the missing parts.
Leveraging Spark to Democratize Data for Omni-Commerce with Shafaq AbdullahDatabricks
Insnap, a hyper-personalized ML-based platform acquired by The Honest Company, has been used to build a real-time data platform based on Apache Spark, Cassandra and Redshift. Users’ behavioral and transactional data have been used to build data models and ML models, and to drive use cases for marketing, growth, finance and operations.
Learn how Honest Company has used Spark as a workhorse for 1) collecting, ETL and storing data from various sources including mysql, mongo, jde, Google analytics, Facebook, Localytics and REST API; 2) building data models and aggregating and generating reports of revenue, order fulfillment tracking, data pipeline monitoring and subscriptions; 3) Using ML to build model for user acquisitions, LTV and recommendations use cases. Spark replaced the monolithic codebase with flexible, scalable and robust pipelines. Databricks helped The Honest Company to focus on data instead of maintaining infrastructure. While Honest users got delightful recommendations to improve experience, data users at Honest understood users much better in terms of segmenting with behavioral information and advanced ML models, leading to increased revenue and retention.
AnsibleFest 2020 - Automate cybersecurity solutions in a cloud native scenarioRoberto Carratala
Roberto Carratalá and Diego Escobar will present on automating cybersecurity solutions in a cloud native scenario using Red Hat Ansible Tower. The presentation will cover 5 labs demonstrating how to provision Tower, deploy an Azure environment, automatically configure Checkpoint security management and gateways, deploy applications with cybersecurity rules, and deploy NAT and firewall access rules. Red Hat experts Adrienne, Leonardo, Asier, and German will assist during the presentation. Access details and passwords to the lab environments are provided.
FutureGrid Computing Testbed as a ServiceGeoffrey Fox
Describes FutureGrid and its role as a Computing Testbed as a Service. FutureGrid is user-customizable, accessed interactively and supports Grid, Cloud and HPC software with and without VM’s. Lessons learnt and example use cases are described
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...GetInData
Did you like it? Check out our E-book: Apache NiFi - A Complete Guide
https://ebook.getindata.com/apache-nifi-complete-guide
Apache NiFi is one of the most popular services for running ETL pipelines otherwise it’s not the youngest technology. During the talk, there are described all details about migrating pipelines from the old Hadoop platform to the Kubernetes, managing everything as the code, monitoring all corner cases of NiFi and making it a robust solution that is user-friendly even for non-programmers.
Author: Albert Lewandowski
Linkedin: https://www.linkedin.com/in/albert-lewandowski/
___
Getindata is a company founded in 2014 by ex-Spotify data engineers. From day one our focus has been on Big Data projects. We bring together a group of best and most experienced experts in Poland, working with cloud and open-source Big Data technologies to help companies build scalable data architectures and implement advanced analytics over large data sets.
Our experts have vast production experience in implementing Big Data projects for Polish as well as foreign companies including i.a. Spotify, Play, Truecaller, Kcell, Acast, Allegro, ING, Agora, Synerise, StepStone, iZettle and many others from the pharmaceutical, media, finance and FMCG industries.
https://getindata.com
Lambda Data Grid: An Agile Optical Platform for Grid Computing and Data-inten...Tal Lavian Ph.D.
Lambda Data Grid
An Agile Optical Platform for Grid Computing
and Data-intensive Applications
Focus on BIRN Mouse application.
Great vision –
LambdaGrid is one step towards this concepts
LambdaGrid –
A novel service architecture
Lambda as a Scheduled Service
Lambda as a prime resource - like storage and computation
Change our current systems assumptions
Potentially opens new horizon
Webinar: Deep Learning Pipelines Beyond the LearningMesosphere Inc.
Mesosphere technical lead Joerg Schad looks at the complete deep learning pipeline. In these slides, Joerg addresses commonly asked questions, such as:
1. How can we easily deploy distributed deep learning frameworks on any public or private infrastructure?
2. How can we manage different deep learning frameworks on a single cluster, especially considering heterogeneous resources such as GPUs?
3. What is the best UI for a data scientist to work with the cluster?
4. How can we store & serve models at scale?
5. How can we update models that are currently in use without causing downtime for the service using them?
6. How can we monitor the entire pipeline and track performance of the deployed models?
Edge optimized architecture for fabric defect detection in real-timeShuquan Huang
In textile industry, fabric defect relies on human inspection traditionally, which is inaccurate, inconsistent, inefficient and expensive. There were automatic systems developed on the defect detection by identifying the faults in fabric surface using the image and video processing techniques. However, the existing solution has insufficiencies in defect data sharing, backhaul interconnect, maintenance and etc. By evolving to an edge-optimized architecture, we can help textile industry improve fabric quality, reduce operation cost and increase production efficiency. In this session, I’ll share:
What’s edge computing and why it’s important to intelligence manufacturing
What’s the characteristics, strengths and weaknesses of traditional fabric defect detection method
Why textile industry can benefit from edge computing infrastructure
How to design and implement an edge-enabled application for fabric defect detection in real-time
Insights, synergy and future research directions
GOAI: GPU-Accelerated Data Science DataSciCon 2017Joshua Patterson
The GPU Open Analytics Initiative, GOAI, is accelerating data science like never before. CPUs are not improving at the same rate as networking and storage, and leveraging GPUs data scientist can analyze more data than ever with less hardware. Learn more about how GPU are accelerating data science (not just Deep Learning), and how to get started.
Project Onboarding gives attendees a chance to meet some of the project team and get to know the project. Attendees will learn about the project itself, the code structure/ overall architecture, etc, and places where contribution is needed. Attendees will also get to know some of the core contributors and other established community members.
This document discusses accelerating cyber threat detection with GPUs. It begins by noting that current detection methods are too slow, taking an average of 98 days for financial services and up to 7 months for retailers. It then discusses how attacks are becoming more sophisticated and provides examples. The document outlines principles for cybersecurity, including improving indication of compromise through combining machine learning, graph analysis, and other methods. It discusses building an anomaly detection platform using deep learning and GPUs for improved performance. It also covers using GPU databases and visualization to further accelerate analytics and hunting of threats.
SnappyData is a new open source project started by Pivotal GemFire founders to provide a unified platform for OLTP, OLAP and streaming analytics using Spark. It aims to simplify big data architectures by supporting mixed workloads in a single clustered database, allowing for real-time operational analytics on live data without copying to other systems. This provides faster insights than current approaches that require periodic data copying between different databases and analytics systems.
Real-Time Analytics with Confluent and MemSQLSingleStore
This document discusses enabling real-time analytics for IoT applications. It describes how industries like auto, transportation, energy, warehousing and logistics, and healthcare need real-time analytics to handle streaming data from IoT sensors. It also discusses how Confluent's Kafka stream processing platform can be used to build applications that ingest IoT data at high speeds, transform the data, and power real-time analytics and user interfaces. MemSQL's in-memory database is presented as a fast and scalable storage option to support real-time analytics on the large volumes of IoT data.
This document discusses moving machine learning models from prototype to production. It outlines some common problems with the current workflow where moving to production often requires redevelopment from scratch. Some proposed solutions include using notebooks as APIs and developing analytics that are accessed via an API. It also discusses different data science platforms and architectures for building end-to-end machine learning systems, focusing on flexibility, security, testing and scalability for production environments. The document recommends a custom backend integrated with Spark via APIs as the best approach for the current project.
Stardust is a mature, industry-proven business process management suite that is now available as an open source project under the Eclipse Public License. It includes capabilities for workflow, system integration, and document management. Stardust has seen production deployments with over 10,000 users, 1,000,000 processes per day, and 300,000 documents per day. The codebase consists of over 3 million lines of code and 200 third-party libraries. The Stardust community is actively enhancing the knowledge base and collaborating with other projects.
1. Coding and workflow automation are essential to scaling processes in the cloud. Low-coding strategies allow developers to automate workflows using Python and other languages.
2. Combining knowledge of MicroStrategy and Python is rare but important for automating development and operations tasks. The document proposes bringing on young developers with Python skills and coaching them on both technologies.
3. Automating common tasks like regression testing of reports against changing data models could be a starting point for such a combined team to build and test automation solutions.
This document discusses DevOps and MLOps practices for machine learning models. It outlines that while ML development shares some similarities with traditional software development, such as using version control and CI/CD pipelines, there are also key differences related to data, tools, and people. Specifically, ML requires additional focus on exploratory data analysis, feature engineering, and specialized infrastructure for training and deploying models. The document provides an overview of how one company structures their ML team and processes.
A brief update on Microsoft’s recent history in Open Source with specific emphasis on Azure Databricks, a fast, easy and collaborative Apache Spark-based analytics service. You will learn how to integrate MongoDB Atlas with Azure Databricks using the MongoDB Connector for Spark.
From leading IoT Protocols to Python Dashboarding_finalLukas Ott
First i like to give an overview on common IoT Protocols:
#CoAP (Constrained Application Protocol -> Close to HTTP / REST ) #MQTT ( Message Queue Telemetry Transport -> Pub/Sub with Broker -> Well defined Quality of Service -> Newest addition Eclipse Amlem (formerly the core of IBM Watson IoT platform) -> Eclipse Sparkplug -> Standardization of the topics and payloads -> Interoperability!) , #DDS (Data Distribution Service -> Pub/Sub without Broker -> Drones / Robotics) #LwM2M (Lightweight M2M -> Runs on Top of CoAP or MQTT -> standard sets of payloads for sensors) #zenoh (https://zenoh.io/ Pub/Sub Protocol -> combines the advantages of #DDS and #MQTT) #eclipsefoundation #apache #opensource #lightweight (+ some comments that this is not complete and does not encompass Industrial and Building Automation)
Then I would like to show the leading edge IoT protocol Zenoh. Saving Zenoh Payload to Apache IoTDB. After that I would like to dive into Panel and the awesome capabilities of Apache ECharts.
Processing 19 billion messages in real time and NOT dying in the processJampp
Here is an introduction in the Jampp architecture for data processing. We walk through our journey of migrating to systems that allows us to process more data in real time
dbt Python models - GoDataFest by Guillermo SanchezGoDataDriven
Guillermo Sanchez presented on the pros and cons of using Python models in dbt. While Python models allow for more advanced analytics and leveraging the Python ecosystem, they also introduce more complexity in setup and divergent APIs across platforms. Additionally, dbt may not be well-suited for certain use cases like ingesting external data or building full MLOps pipelines. In general, Python models are best for the right analytical use cases, but caution is needed, especially for production environments.
Hybrid Transactional/Analytics Processing with Spark and IMDGsAli Hodroj
This document discusses hybrid transactional/analytical processing (HTAP) with Apache Spark and in-memory data grids. It begins by introducing the speaker and GigaSpaces. It then discusses how modern applications require both online transaction processing and real-time operational intelligence. The document presents examples from retail and IoT and the goals of minimizing latency while maximizing data analytics locality. It provides an overview of in-memory computing options and describes how GigaSpaces uses an in-memory data grid combined with Spark to achieve HTAP. The document includes deployment diagrams and discusses data grid RDDs and pushing predicates to the data grid. It describes how this was productized as InsightEdge and provides additional innovations and reference architectures.
This document summarizes the benefits of building an in-house machine learning platform called Positron. Key points:
- Positron allows for quick and consistent model deployments, simplified model management, experiment tracking, and efficient workflows.
- It features a multi-model pipeline for seamless model creation and validation. Models can be deployed with minimal configuration.
- The platform uses MLeap for model serialization/deserialization, which provides portability and fast performance without dependencies on specific frameworks.
- It aims to provide low latency and high throughput predictions, while allowing for customization and integration with existing infrastructure. External and internal models can be easily deployed.
OEP allows harvesting of real time business insights from edge devices in the Internet of Things. It combines data from multiple sources to identify complex events and enable faster decision making and actions. This reduces latency and improves responsiveness. OEP Embedded is optimized for edge devices like sensors and gateways. It features a continuous query language, event processing network, and supports modular development. Use cases include smart grids, industrial automation, building security, and vehicle telematics.
The large O’Reilly survey on serverless adoption indicated that the majority of enterprises have not yet adopted serverless. They have cited the following concerns as main factors: security, the steep learning curve, vendor lock-in, integration/debugging and observability of serverless applications.
In this talk, I will share my views on these concerns and present how Waylay IO has addressed these challenges. Waylay IO’s mission is to finally unlock all promised benefits of serverless computation, with an intuitive and developer-friendly low-code platform.
(1) The document discusses industrializing Apache Spark jobs by applying best practices from software engineering including setting up a development environment, testing, continuous integration/delivery (CI/CD), monitoring, and data governance.
(2) It provides recommendations for developing Spark jobs like unit testing functions, integration testing on sample data, validating job efficiency, and benchmarking for performance testing.
(3) Applying practices like CI/CD, logging to tools like ELK/Splunk, monitoring resources and optimizing data storage formats can help improve the reliability, performance and maintainability of Spark jobs.
Come può .NET contribuire alla Data Science? Cosa è .NET Interactive? Cosa c'entrano i notebook? E Apache Spark? E il pythonismo? E Azure? Vediamo in questa sessione di mettere in ordine le idee.
David Thoumas, OpenDataSoft CTO, about data API strategy (rich API vs. multiple end-points) for broadcasting data & making business
At APIdays 2012, the 1st European event dedicated to API world
SnappyData is a new open source project started by Pivotal GemFire founders to build a unified cluster capable of OLTP, OLAP, and streaming analytics using Spark. SnappyData fuses an elastic, highly available in-memory store for OLTP with Spark's memory manager and query engine to provide a single system for mixed workloads with fast ingestion, high concurrency and the ability to work with live, mutable data.
The document discusses the benefits of exercise for mental health. Regular physical activity can help reduce anxiety and depression and improve mood and cognitive functioning. Exercise boosts blood flow, releases endorphins, and promotes changes in the brain which help enhance one's emotional well-being and mental clarity.
ZCloud Consensus on Hardware for Distributed SystemsGokhan Boranalp
3rd Workshop on Dependability,
May 8, Monday 2017, İYTE,
https://goo.gl/fSVnZy
http://dcs.iyte.edu.tr/ws/ppt/10/presentation.pdf
In distributed applications where the number of members in the cluster increases, the
separation of the consensus related operations at the hardware level is essential for the
following reasons:
1. At the operating system level, messages broadcast on the protocol stack cause latency.
2. It is necessary to increase the number of completed transactions in the communication of
distributed system components and on the network unit (throughput).
3. For devices with limited storage and CPU computing facilities that use embedded operating
systems such as IOT devices, it is also necessary to reduce the processing burden due to
"consensus" operations.
4. A common consensus communication model is needed for different applications that need
to work together in (BFT) distributed systems.
Erlang 101 provides an overview of the Erlang programming language. It discusses Erlang's history and current status, key features like concurrency, distribution, and fault tolerance. These features make Erlang well-suited for building large, distributed, highly-available systems. Examples are given of companies using Erlang like Amazon, Facebook, Twitter, and of applications and frameworks built with Erlang like Apache CouchDB, RabbitMQ, and Riak. Resources for learning more about Erlang are also provided.
This document is a 101 guide to Git that introduces key concepts like snapshots instead of differences, configuration files, common commands, hooks, objects, and resources for further learning. It discusses SCM and VCS systems at a high level, explains basic Git functionality and configuration, and provides additional references to dive deeper into topics like hooks, plumbing, and the client-server model if there is extra time.
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyScyllaDB
Freshworks creates AI-boosted business software that helps employees work more efficiently and effectively. Managing data across multiple RDBMS and NoSQL databases was already a challenge at their current scale. To prepare for 10X growth, they knew it was time to rethink their database strategy. Learn how they architected a solution that would simplify scaling while keeping costs under control.
Taking AI to the Next Level in Manufacturing.pdfssuserfac0301
Read Taking AI to the Next Level in Manufacturing to gain insights on AI adoption in the manufacturing industry, such as:
1. How quickly AI is being implemented in manufacturing.
2. Which barriers stand in the way of AI adoption.
3. How data quality and governance form the backbone of AI.
4. Organizational processes and structures that may inhibit effective AI adoption.
6. Ideas and approaches to help build your organization's AI strategy.
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...alexjohnson7307
Predictive maintenance is a proactive approach that anticipates equipment failures before they happen. At the forefront of this innovative strategy is Artificial Intelligence (AI), which brings unprecedented precision and efficiency. AI in predictive maintenance is transforming industries by reducing downtime, minimizing costs, and enhancing productivity.
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfChart Kalyan
A Mix Chart displays historical data of numbers in a graphical or tabular form. The Kalyan Rajdhani Mix Chart specifically shows the results of a sequence of numbers over different periods.
In the realm of cybersecurity, offensive security practices act as a critical shield. By simulating real-world attacks in a controlled environment, these techniques expose vulnerabilities before malicious actors can exploit them. This proactive approach allows manufacturers to identify and fix weaknesses, significantly enhancing system security.
This presentation delves into the development of a system designed to mimic Galileo's Open Service signal using software-defined radio (SDR) technology. We'll begin with a foundational overview of both Global Navigation Satellite Systems (GNSS) and the intricacies of digital signal processing.
The presentation culminates in a live demonstration. We'll showcase the manipulation of Galileo's Open Service pilot signal, simulating an attack on various software and hardware systems. This practical demonstration serves to highlight the potential consequences of unaddressed vulnerabilities, emphasizing the importance of offensive security practices in safeguarding critical infrastructure.
Digital Banking in the Cloud: How Citizens Bank Unlocked Their MainframePrecisely
Inconsistent user experience and siloed data, high costs, and changing customer expectations – Citizens Bank was experiencing these challenges while it was attempting to deliver a superior digital banking experience for its clients. Its core banking applications run on the mainframe and Citizens was using legacy utilities to get the critical mainframe data to feed customer-facing channels, like call centers, web, and mobile. Ultimately, this led to higher operating costs (MIPS), delayed response times, and longer time to market.
Ever-changing customer expectations demand more modern digital experiences, and the bank needed to find a solution that could provide real-time data to its customer channels with low latency and operating costs. Join this session to learn how Citizens is leveraging Precisely to replicate mainframe data to its customer channels and deliver on their “modern digital bank” experiences.
Monitoring and Managing Anomaly Detection on OpenShift.pdfTosin Akinosho
Monitoring and Managing Anomaly Detection on OpenShift
Overview
Dive into the world of anomaly detection on edge devices with our comprehensive hands-on tutorial. This SlideShare presentation will guide you through the entire process, from data collection and model training to edge deployment and real-time monitoring. Perfect for those looking to implement robust anomaly detection systems on resource-constrained IoT/edge devices.
Key Topics Covered
1. Introduction to Anomaly Detection
- Understand the fundamentals of anomaly detection and its importance in identifying unusual behavior or failures in systems.
2. Understanding Edge (IoT)
- Learn about edge computing and IoT, and how they enable real-time data processing and decision-making at the source.
3. What is ArgoCD?
- Discover ArgoCD, a declarative, GitOps continuous delivery tool for Kubernetes, and its role in deploying applications on edge devices.
4. Deployment Using ArgoCD for Edge Devices
- Step-by-step guide on deploying anomaly detection models on edge devices using ArgoCD.
5. Introduction to Apache Kafka and S3
- Explore Apache Kafka for real-time data streaming and Amazon S3 for scalable storage solutions.
6. Viewing Kafka Messages in the Data Lake
- Learn how to view and analyze Kafka messages stored in a data lake for better insights.
7. What is Prometheus?
- Get to know Prometheus, an open-source monitoring and alerting toolkit, and its application in monitoring edge devices.
8. Monitoring Application Metrics with Prometheus
- Detailed instructions on setting up Prometheus to monitor the performance and health of your anomaly detection system.
9. What is Camel K?
- Introduction to Camel K, a lightweight integration framework built on Apache Camel, designed for Kubernetes.
10. Configuring Camel K Integrations for Data Pipelines
- Learn how to configure Camel K for seamless data pipeline integrations in your anomaly detection workflow.
11. What is a Jupyter Notebook?
- Overview of Jupyter Notebooks, an open-source web application for creating and sharing documents with live code, equations, visualizations, and narrative text.
12. Jupyter Notebooks with Code Examples
- Hands-on examples and code snippets in Jupyter Notebooks to help you implement and test anomaly detection models.
Programming Foundation Models with DSPy - Meetup SlidesZilliz
Prompting language models is hard, while programming language models is easy. In this talk, I will discuss the state-of-the-art framework DSPy for programming foundation models with its powerful optimizers and runtime constraint system.
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on integration of Salesforce with Bonterra Impact Management.
Interested in deploying an integration with Salesforce for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...Alex Pruden
Folding is a recent technique for building efficient recursive SNARKs. Several elegant folding protocols have been proposed, such as Nova, Supernova, Hypernova, Protostar, and others. However, all of them rely on an additively homomorphic commitment scheme based on discrete log, and are therefore not post-quantum secure. In this work we present LatticeFold, the first lattice-based folding protocol based on the Module SIS problem. This folding protocol naturally leads to an efficient recursive lattice-based SNARK and an efficient PCD scheme. LatticeFold supports folding low-degree relations, such as R1CS, as well as high-degree relations, such as CCS. The key challenge is to construct a secure folding protocol that works with the Ajtai commitment scheme. The difficulty, is ensuring that extracted witnesses are low norm through many rounds of folding. We present a novel technique using the sumcheck protocol to ensure that extracted witnesses are always low norm no matter how many rounds of folding are used. Our evaluation of the final proof system suggests that it is as performant as Hypernova, while providing post-quantum security.
Paper Link: https://eprint.iacr.org/2024/257
Dandelion Hashtable: beyond billion requests per second on a commodity serverAntonios Katsarakis
This slide deck presents DLHT, a concurrent in-memory hashtable. Despite efforts to optimize hashtables, that go as far as sacrificing core functionality, state-of-the-art designs still incur multiple memory accesses per request and block request processing in three cases. First, most hashtables block while waiting for data to be retrieved from memory. Second, open-addressing designs, which represent the current state-of-the-art, either cannot free index slots on deletes or must block all requests to do so. Third, index resizes block every request until all objects are copied to the new index. Defying folklore wisdom, DLHT forgoes open-addressing and adopts a fully-featured and memory-aware closed-addressing design based on bounded cache-line-chaining. This design offers lock-free index operations and deletes that free slots instantly, (2) completes most requests with a single memory access, (3) utilizes software prefetching to hide memory latencies, and (4) employs a novel non-blocking and parallel resizing. In a commodity server and a memory-resident workload, DLHT surpasses 1.6B requests per second and provides 3.5x (12x) the throughput of the state-of-the-art closed-addressing (open-addressing) resizable hashtable on Gets (Deletes).
HCL Notes and Domino License Cost Reduction in the World of DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-and-domino-license-cost-reduction-in-the-world-of-dlau/
The introduction of DLAU and the CCB & CCX licensing model caused quite a stir in the HCL community. As a Notes and Domino customer, you may have faced challenges with unexpected user counts and license costs. You probably have questions on how this new licensing approach works and how to benefit from it. Most importantly, you likely have budget constraints and want to save money where possible. Don’t worry, we can help with all of this!
We’ll show you how to fix common misconfigurations that cause higher-than-expected user counts, and how to identify accounts which you can deactivate to save money. There are also frequent patterns that can cause unnecessary cost, like using a person document instead of a mail-in for shared mailboxes. We’ll provide examples and solutions for those as well. And naturally we’ll explain the new licensing model.
Join HCL Ambassador Marc Thomas in this webinar with a special guest appearance from Franz Walder. It will give you the tools and know-how to stay on top of what is going on with Domino licensing. You will be able lower your cost through an optimized configuration and keep it low going forward.
These topics will be covered
- Reducing license cost by finding and fixing misconfigurations and superfluous accounts
- How do CCB and CCX licenses really work?
- Understanding the DLAU tool and how to best utilize it
- Tips for common problem areas, like team mailboxes, functional/test users, etc
- Practical examples and best practices to implement right away
Trusted Execution Environment for Decentralized Process MiningLucaBarbaro3
Presentation of the paper "Trusted Execution Environment for Decentralized Process Mining" given during the CAiSE 2024 Conference in Cyprus on June 7, 2024.
Fueling AI with Great Data with Airbyte WebinarZilliz
This talk will focus on how to collect data from a variety of sources, leveraging this data for RAG and other GenAI use cases, and finally charting your course to productionalization.
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Zops intelligent platform
1. ZOPS Intelligent Platform
Gökhan Boranalp, http://www.zetaops.io
https://tr.linkedin.com/in/gokhanboranalp @zetaops
gokhan@zetaops.io
Haziran 2016
* This project in incubation stage and we are currently working on it. The features
mentioned in this presentation shouldn’t be taken as ready for production use.
3. Problem definition
●
Business applications should “merge” with data
science tools
●
Sooner or later, IOT platforms should be
“controlled” by business layer of system.
●
Good old “expert systems” becoming smart
systems. Because of this, every system “will” be
adopted to AI in near future.
●
Data related apps should be distributed and cloud
operated, period!
5. ZOPS Intelligent Platform
●
ZOPS is a Python based, Open Source Intelligent
microservice Platform to manage, “BPMN 2 business
processes (workflows)”, “business data” and “IOT generated
data” with data analysis and AI capabilities.
●
ZOPS can easily connected an existing IOT platform to
collect data.
●
ZOPS also operates with or without IOT part in vertical
sectors.
●
ZOPS is horizontally distributed platform, designed for cloud
(Openstack, GCE, Amazon etc.) environments.
●
No need for additional services (Amazon SQS, Elastic Cloud
etc.)
●
No cloud vendor lock-in with ZOPS.
6. Why another platform?
●
Data science should be done with live data.
●
Machine learning results should reflect back to system
behaviour.
●
Neural nets should be natural part of system.
●
Resource utilization is a must in cloud envs.
●
Traditional RDBMS for business, NoSQL power to
collect IOT data is an obligation.
●
Release often, release early with strong dynamically
typed Python.
●
The first and only platform, developed with Python with
these components.
➔
Designed for millions of users
10. Underlying magic/technology
●
Zengine, Python based, advanced BPMN workflow
engine.
●
Pyoko, Riak NoSQL ORM.
●
Zato ESB.
●
Real time stream processing with directly
connected data, Apache Spark.
●
Basho Riak, Riak TS, Riak S2 for data storage
●
Tornado async web server
●
Cloud centric system architecture.
11. Status
● Finalize Spark integration
● Redesign of metric collector
● Mesos integration for Spark jobs.
●Automate Chef books
●Implementation of SQL Alchemy
12. Business Model
Open source business model is on the rise!
●
SAAS, PAAS deployment options.
●
Dual licensing,
●
OSS with Apache 2 license
●
Enterprise with BSD license
13. Team
●
8 developers
●
1 project manager
●
4 students working half time for project
●
2 designers for UI design and coding
●
2 remote coders
●
4 advisers