This document provides an overview of WSO2 and their offerings for building big data solutions. WSO2 provides open source components for building complete cloud platforms and is recognized as a leader in application infrastructure by Gartner and Forrester. They discuss the challenges of big data due to the large volumes and speeds at which data is generated today. WSO2's products like BAM and CEP help customers address the full data lifecycle from collection, storage, processing to analytics for big data use cases. The document outlines an example big data architecture implemented using WSO2 components along with other technologies like Cassandra.
AdminCamp 2018 - ApplicationInsights für AdministratorenChristoph Adler
Den aktuellen Zustand und den korrekten zukünftigen Weg für Ihre IBM Domino-Anwendungsumgebung aufzuzeigen, ist schwierig bis unmöglich, wenn Daten wie Nutzung und Design-Metriken fehlen.
Kommen Sie in diese Session und erfahren Sie, wie Ihnen ApplicationInsights (kostenfreie IBM-Version) helfen kann, diese Aufgabe zu bewältigen. Finden Sie heraus, wie Ihre eigenen Daten in einfach zu verstehende Dashboards umgewandelt werden, die die aktuelle Anwendungsnutzung, Codekomplexität, Designähnlichkeit und Transformations-Roadblocks und -möglichkeiten anzeigen. Auf Basis dieser Daten können Sie entscheiden, welche Anwendungen problemlos archiviert, neugeschrieben oder modernisiert werden sollen. Diese Session ist von und für Administratoren. Entwicklungskenntnisse werden nicht vorausgesetzt
Delivering digital transformation and business impact with io t, machine lear...Robert Sanders
A world-leading manufacturer was in search of an IoT solution that could ingest, integrate, and manage data being generated from various types of connected machinery located on factory floors around the globe. The company needed to manage the devices generating the data, integrate the flow of data into existing back-end systems, run advanced analytics on that data, and then deliver services to generate real-time decision making at the edge.
In this session, learn how Clairvoyant, a leading systems integrator and Red Hat partner, was able to accelerate digital transformation for their customer using Internet of Things (IoT) and machine learning in a hybrid cloud environment. Specifically, Clairvoyant and Eurotech will discuss:
• The approach taken to optimize manufacturing processes to cut costs, minimize downtime, and increase efficiency.
• How a data processing pipeline for IoT data was built using an open, end-to-end architecture from Cloudera, Eurotech, and Red Hat.
• How analytics and machine learning inferencing powered at the IoT edge will allow predictions to be made and decisions to be executed in real time.
• The flexible and hybrid cloud environment designed to provide the key foundational elements to quickly and securely roll out IoT use cases.
Automating Data Quality Processes at ReckittDatabricks
Reckitt is a fast-moving consumer goods company with a portfolio of famous brands and over 30k employees worldwide. With that scale small projects can quickly grow into big datasets, and processing and cleaning all that data can become a challenge. To solve that challenge we have created a metadata driven ETL framework for orchestrating data transformations through parametrised SQL scripts. It allows us to create various paths for our data as well as easily version control them. The approach of standardising incoming datasets and creating reusable SQL processes has proven to be a winning formula. It has helped simplify complicated landing/stage/merge processes and allowed them to be self-documenting.
But this is only half the battle, we also want to create data products. Documented, quality assured data sets that are intuitive to use. As we move to a CI/CD approach, increasing the frequency of deployments, the demand of keeping documentation and data quality assessments up to date becomes increasingly challenging. To solve this problem, we have expanded our ETL framework to include SQL processes that automate data quality activities. Using the Hive metastore as a starting point, we have leveraged this framework to automate the maintenance of a data dictionary and reduce documenting, model refinement, testing data quality and filtering out bad data to a box filling exercise. In this talk we discuss our approach to maintaining high quality data products and share examples of how we automate data quality processes.
Big Data Pipeline for Analytics at Scale @ FIT CVUT 2014Jaroslav Gergic
The recent boom in big data processing and democratization of the big data space has been enabled by the fact that most of the concepts originated in the research labs of companies such as Google, Amazon, Yahoo and Facebook are now available as open source. Technologies such as Hadoop, Cassandra let businesses around the world to become more data driven and tap into their massive data feeds to mine valuable insights.
At the same time, we are still at a certain stage of the maturity curve of these new big data technologies and of the entire big data technology stack. Many of the technologies originated from a particular use case and attempts to apply them in a more generic fashion are hitting the limits of their technological foundations. In some areas, there are several competing technologies for the same set of use cases, which increases risks and costs of big data implementations.
We will show how GoodData solves the entire big data pipeline today, starting from raw data feeds all the way up to actionable business insights. All this provided as a hosted multi-tenant environment letting its customers to solve their particular analytical use case or many analytical use cases for thousands of their customers all using the same platform and tools while abstracting them away from the technological details of the big data stack.
Empowering Real Time Patient Care Through Spark StreamingDatabricks
Takeda’s Plasma Derived Therapies (PDT) business unit has recently embarked on a project to use Spark Streaming on Databricks to empower how they deliver value to their Plasma Donation centers. As patients come in and interface without clinics, we store and track all of the patient interactions in real time and deliver outputs and results based on said interactions. The current problem with our existing architecture is that it is very expensive to maintain and has an unsustainable number of failure points. Spark Streaming is essential for allowing this use case because it allows for a more robust ETL pipeline. With Spark Streaming, we are able to replace our existing ETL processes (that are based on Lamdbas, step functions, triggered jobs, etc) into a purely stream driven architecture.
Data is brought into our s3 raw layer as a large set of CSV files through AWS DMS and Informatica IICS as these services bring data from on-prem systems into our cloud layer. We have a stream currently running which takes these raw files up and merges them into Delta tables established in the bronze/stage layer. We are using AWS Glue as the metadata provider for all of these operations. From the stage layer, we have another set of streams using the stage Delta tables as their source, which transform and conduct stream to stream lookups before writing the enriched records into RDS (silver/prod layer). Once the data has been merged into RDS we have a DMS task which lifts the data back into S3 as CSV files. We have a small intermediary stream which merge these CSV files into corresponding delta tables, from which we have our gold/analytic streams. The on-prem systems are able to speak to the silver layer and allow for the near real-time latency that our patient care centers require.
Delta Lake delivers reliability, security and performance to data lakes. Join this session to learn how customers have achieved 48x faster data processing, leading to 50% faster time to insight after implementing Delta Lake. You’ll also learn how Delta Lake provides the perfect foundation for a cost-effective, highly scalable lakehouse architecture.
Challenges of Operationalising Data Science in Productioniguazio
The presentation topic for this meet-up was covered in two sections without any breaks in-between
Section 1: Business Aspects (20 mins)
Speaker: Rasmi Mohapatra, Product Owner, Experian
https://www.linkedin.com/in/rasmi-m-428b3a46/
Once your data science application is in the production, there are many typical data science operational challenges experienced today - across business domains - we will cover a few challenges with example scenarios
Section 2: Tech Aspects (40 mins, slides & demo, Q&A )
Speaker: Santanu Dey, Solution Architect, Iguazio
https://www.linkedin.com/in/santanu/
In this part of the talk, we will cover how these operational challenges can be overcome e.g. automating data collection & preparation, making ML models portable & deploying in production, monitoring and scaling, etc.
with relevant demos.
AdminCamp 2018 - ApplicationInsights für AdministratorenChristoph Adler
Den aktuellen Zustand und den korrekten zukünftigen Weg für Ihre IBM Domino-Anwendungsumgebung aufzuzeigen, ist schwierig bis unmöglich, wenn Daten wie Nutzung und Design-Metriken fehlen.
Kommen Sie in diese Session und erfahren Sie, wie Ihnen ApplicationInsights (kostenfreie IBM-Version) helfen kann, diese Aufgabe zu bewältigen. Finden Sie heraus, wie Ihre eigenen Daten in einfach zu verstehende Dashboards umgewandelt werden, die die aktuelle Anwendungsnutzung, Codekomplexität, Designähnlichkeit und Transformations-Roadblocks und -möglichkeiten anzeigen. Auf Basis dieser Daten können Sie entscheiden, welche Anwendungen problemlos archiviert, neugeschrieben oder modernisiert werden sollen. Diese Session ist von und für Administratoren. Entwicklungskenntnisse werden nicht vorausgesetzt
Delivering digital transformation and business impact with io t, machine lear...Robert Sanders
A world-leading manufacturer was in search of an IoT solution that could ingest, integrate, and manage data being generated from various types of connected machinery located on factory floors around the globe. The company needed to manage the devices generating the data, integrate the flow of data into existing back-end systems, run advanced analytics on that data, and then deliver services to generate real-time decision making at the edge.
In this session, learn how Clairvoyant, a leading systems integrator and Red Hat partner, was able to accelerate digital transformation for their customer using Internet of Things (IoT) and machine learning in a hybrid cloud environment. Specifically, Clairvoyant and Eurotech will discuss:
• The approach taken to optimize manufacturing processes to cut costs, minimize downtime, and increase efficiency.
• How a data processing pipeline for IoT data was built using an open, end-to-end architecture from Cloudera, Eurotech, and Red Hat.
• How analytics and machine learning inferencing powered at the IoT edge will allow predictions to be made and decisions to be executed in real time.
• The flexible and hybrid cloud environment designed to provide the key foundational elements to quickly and securely roll out IoT use cases.
Automating Data Quality Processes at ReckittDatabricks
Reckitt is a fast-moving consumer goods company with a portfolio of famous brands and over 30k employees worldwide. With that scale small projects can quickly grow into big datasets, and processing and cleaning all that data can become a challenge. To solve that challenge we have created a metadata driven ETL framework for orchestrating data transformations through parametrised SQL scripts. It allows us to create various paths for our data as well as easily version control them. The approach of standardising incoming datasets and creating reusable SQL processes has proven to be a winning formula. It has helped simplify complicated landing/stage/merge processes and allowed them to be self-documenting.
But this is only half the battle, we also want to create data products. Documented, quality assured data sets that are intuitive to use. As we move to a CI/CD approach, increasing the frequency of deployments, the demand of keeping documentation and data quality assessments up to date becomes increasingly challenging. To solve this problem, we have expanded our ETL framework to include SQL processes that automate data quality activities. Using the Hive metastore as a starting point, we have leveraged this framework to automate the maintenance of a data dictionary and reduce documenting, model refinement, testing data quality and filtering out bad data to a box filling exercise. In this talk we discuss our approach to maintaining high quality data products and share examples of how we automate data quality processes.
Big Data Pipeline for Analytics at Scale @ FIT CVUT 2014Jaroslav Gergic
The recent boom in big data processing and democratization of the big data space has been enabled by the fact that most of the concepts originated in the research labs of companies such as Google, Amazon, Yahoo and Facebook are now available as open source. Technologies such as Hadoop, Cassandra let businesses around the world to become more data driven and tap into their massive data feeds to mine valuable insights.
At the same time, we are still at a certain stage of the maturity curve of these new big data technologies and of the entire big data technology stack. Many of the technologies originated from a particular use case and attempts to apply them in a more generic fashion are hitting the limits of their technological foundations. In some areas, there are several competing technologies for the same set of use cases, which increases risks and costs of big data implementations.
We will show how GoodData solves the entire big data pipeline today, starting from raw data feeds all the way up to actionable business insights. All this provided as a hosted multi-tenant environment letting its customers to solve their particular analytical use case or many analytical use cases for thousands of their customers all using the same platform and tools while abstracting them away from the technological details of the big data stack.
Empowering Real Time Patient Care Through Spark StreamingDatabricks
Takeda’s Plasma Derived Therapies (PDT) business unit has recently embarked on a project to use Spark Streaming on Databricks to empower how they deliver value to their Plasma Donation centers. As patients come in and interface without clinics, we store and track all of the patient interactions in real time and deliver outputs and results based on said interactions. The current problem with our existing architecture is that it is very expensive to maintain and has an unsustainable number of failure points. Spark Streaming is essential for allowing this use case because it allows for a more robust ETL pipeline. With Spark Streaming, we are able to replace our existing ETL processes (that are based on Lamdbas, step functions, triggered jobs, etc) into a purely stream driven architecture.
Data is brought into our s3 raw layer as a large set of CSV files through AWS DMS and Informatica IICS as these services bring data from on-prem systems into our cloud layer. We have a stream currently running which takes these raw files up and merges them into Delta tables established in the bronze/stage layer. We are using AWS Glue as the metadata provider for all of these operations. From the stage layer, we have another set of streams using the stage Delta tables as their source, which transform and conduct stream to stream lookups before writing the enriched records into RDS (silver/prod layer). Once the data has been merged into RDS we have a DMS task which lifts the data back into S3 as CSV files. We have a small intermediary stream which merge these CSV files into corresponding delta tables, from which we have our gold/analytic streams. The on-prem systems are able to speak to the silver layer and allow for the near real-time latency that our patient care centers require.
Delta Lake delivers reliability, security and performance to data lakes. Join this session to learn how customers have achieved 48x faster data processing, leading to 50% faster time to insight after implementing Delta Lake. You’ll also learn how Delta Lake provides the perfect foundation for a cost-effective, highly scalable lakehouse architecture.
Challenges of Operationalising Data Science in Productioniguazio
The presentation topic for this meet-up was covered in two sections without any breaks in-between
Section 1: Business Aspects (20 mins)
Speaker: Rasmi Mohapatra, Product Owner, Experian
https://www.linkedin.com/in/rasmi-m-428b3a46/
Once your data science application is in the production, there are many typical data science operational challenges experienced today - across business domains - we will cover a few challenges with example scenarios
Section 2: Tech Aspects (40 mins, slides & demo, Q&A )
Speaker: Santanu Dey, Solution Architect, Iguazio
https://www.linkedin.com/in/santanu/
In this part of the talk, we will cover how these operational challenges can be overcome e.g. automating data collection & preparation, making ML models portable & deploying in production, monitoring and scaling, etc.
with relevant demos.
Building a MLOps Platform Around MLflow to Enable Model Productionalization i...Databricks
Getting machine learning models to production is notoriously difficult: it involves multiple teams (data scientists, data and machine learning engineers, operations, …), who often does not speak to each other very well; the model can be trained in one environment but then productionalized in completely different environment; it is not just about the code, but also about the data (features) and the model itself… At DataSentics, as a machine learning and cloud engineering studio, we see this struggle firsthand – on our internal projects and client’s projects as well.
FP&A with Spreadsheets and Spark with Oscar Castaneda-VillagranDatabricks
Financial Planning and Analysis teams often rely on spreadsheets for building data products that provide senior management with analysis and information that is crucial in decision-making. But spreadsheets do not scale and when it comes to expanding models FP&A analysts quickly hit a glass ceiling. The code that FP&A analysts write is in the form of Spreadsheet formulas.
In this talk I will show how Spreadsheet formulas and data can be automatically processed at scale inside a Spark cluster by the driver and worker nodes. Essentially this means running a Spreadsheet at scale inside your Spark cluster. I will show how Spreadsheets and their calculated outputs can be transformed into Data Frames for further processing with Spark.
We will also discuss next steps in FP&A data pipelines including AutoML and use of such pipelines for Data Science. The broader research topic is in the field of Model-Driven Data Product Design & Development which should be of interest to Spark Summit attendees who are looking for use cases and new opportunities to leverage existing corporate assets like Spreadsheets to automatically build working software that adds tremendous value at scale.
Data Science and Enterprise Engineering with Michael Finger and Chris RobisonDatabricks
How Data Scientists and Engineers work in tandem to achieve real-time personalization at Overstock
Personalizing online experiences for users is nothing new, but real-time personalization requires sub-second speed and close collaboration between data scientists and enterprise engineers.
Like the hands on a clock, data scientists and enterprise engineers have shifted their focus from hour- hand quickness to minute-hand speeds with a craving to take advantage of each tick of the second hand and personalize in real-time. Previously, daily activities were consumed on improving customers’ experiences tomorrow. Workflows ran overnight when on perm resources were not being tasked. The focus was on the-day-before jobs, always inching forward 24-hours behind.
Since then, we have shifted to hourly jobs and even to tasks that run every five minutes. Finally, we have been personalizing user experiences within the same day and even during the same session. But could we personalize these experiences instantly, immediately, and in real-time? What would that require? What does it look like? Michael Finger and Chris Robinson explore how data scientists and engineers are working in tandem to achieve real-time personalization at Overstock.com
Learn to Use Databricks for the Full ML LifecycleDatabricks
Machine learning development brings many new complexities beyond the traditional software development lifecycle. Unlike traditional software development, ML developers want to try multiple algorithms, tools and parameters to get the best results, and they need to track this information to reproduce work. In addition, developers need to use many distinct systems to productionize models. In this talk, learn how to operationalize ML across the full lifecycle with Databricks Machine Learning.
Unified Data Catalog - Recommendations powered by Apache Spark & Neo4jDeepak Chandramouli
Youtube | https://youtu.be/zGX0fRLdd6s?list=PLPaGQXwz_-RaoHicnGhL5SyOAp3_lUTQ2&t=1
This is a talk from PayPal at Nodes Online Summit, organized by Neo4j.
For more session details and video - please visit this link.
https://neo4j.com/online-summit/session/recommendations-unified-data-catalog-spark-neo4j
SnapLogic is a US-based, venture-funded software company that is attempting to reinvent integration platform technology - by creating one unified platform that can address many different kinds of application and data integration use cases.
CI/CD Templates: Continuous Delivery of ML-Enabled Data Pipelines on DatabricksDatabricks
Data & ML projects bring many new complexities beyond the traditional software development lifecycle. Unlike software projects, after they were successfully delivered and deployed, they cannot be abandoned but must be continuously monitored if model performance still satisfies all requirements. We can always get new data with new statistical characteristics that can break our pipelines or influence model performance. All these qualities of data & ML projects lead us to the necessity of continuous testing and monitoring of our models and pipelines.
Strata + Hadoop World: Jump Into the Data Lake with Hadoop-Scale Data Integra...SnapLogic
At last week's Strata + Hadoop World in San Jose, CA SnapLogic Chief Scientist Greg Benson talked to big data experts, data scientists and other enterprise IT leaders about the data lake and how SnapLogic comes into play with Hadoop-scale data integration.
Check out this presentation to learn how SnapLogic helps customers adopt Hadoop and automate data integration workflows.
To learn more, visit: www.snaplogic.com/big-data
Oracle Analytics Cloud empowers business analysts and consumers with modern, AI-powered, self-service analytics capabilities for data preparation, visualization, enterprise reporting, augmented analysis, and natural language processing/ generation. It is a single and complete platform that empowers your entire
Spark Usage in Enterprise Business OperationsSAP Technology
At Spark Summit East 2016, SAP’s Ken Tsai highlighted how SAP HANA Vora extends Apache Spark to provide OLAP modeling capabilities and real-time query federation to enterprise data. You will learn real-world use cases where instant insight from a combination of enterprise and Hadoop data make an impact on everyday business operations.
Splunk Ninjas: New Features, Pivot, and Search DojoSplunk
Besides seeing the newest features in Splunk Enterprise and learning the best practices for data models and pivot, we will show you how to use a handful of search commands that will solve most search needs. Learn these well and become a ninja.
Architecting Snowflake for High Concurrency and High PerformanceSamanthaBerlant
Cloud Data Warehousing juggernaut Snowflake has raced out ahead of the pack to deliver a data management platform from which a wealth of new analytics can be run. Using Snowflake as a traditional data warehouse has some obvious cost advantages over a hardware solution. But the real value of Snowflake as a data platform lies in its ability to support a high-concurrency analytics platform using Kyligence Cloud, powered by Apache Kylin.
In this presentation, Senior Solutions Architect Robert Hardaway will describe a modern data service architecture using precomputation and distributed indexes to provide interactive analytics to hundreds or even thousands of users running against very large Snowflake datasets (TBs to PBs).
Gimel is a data abstraction framework built on Apache Spark - providing unified Data Access via API & SQL to different technologies such as kafka, elastic, HBASE, Rest API, File, Object stores, Relational , etc.
We spoke about this recently in the "cloud track" in the "Scale By The Bay" Conference.
https://www.scale.bythebay.io/schedule
https://sched.co/e55D
Youtube - https://www.youtube.com/watch?v=cy8g2WZbEBI&ab_channel=FunctionalTV
https://youtu.be/m6_0iI4XDpU
Learning to Rank Datasets for Search with Oscar CastanedaDatabricks
Learning to rank methods automatically learn from user interaction instead of relying on labeled data prepared manually. Learning to rank, also referred to as machine-learned ranking, is an application of reinforcement learning concerned with building ranking models for information retrieval. Learning to rank has been successfully applied in building intelligent search engines, but has yet to show up in dataset search.
Dataset search is ripe for innovation with learning to rank specifically by automating the process of index construction. Oscar will recap previous presentations on dataset search and introduce learning to rank as a way to automate relevance scoring of dataset search results. He will also give a demo of a dataset search engine that makes use of an automatically constructed index using learning to rank on Elasticsearch and Spark.
Oscar will explain the motivation and use case of learning to rank in dataset search focusing on why it is interesting to rank datasets through machine-learned relevance scoring and how to improve indexing efficiency by tapping into user interaction data from clicks. Dataset Search and Learning to Rank are IR and ML topics that should be of interest to Spark Summit attendees who are looking for use cases and new opportunities to organize and rank Datasets in Data Lakes to make them searchable and relevant to users.
In preparation for this talk it is recommend that attendees watch previous two talks on dataset search from prior Spark Summit events as they build up to the present talk:
[1] https://spark-summit.org/east-2017/events/building-a-dataset-search-engine-with-spark-and-elasticsearch/
[2] https://spark-summit.org/eu-2016/events/spark-cluster-with-elasticsearch-inside/
Presentation Location and Context World, 2015. Palo Alto, CA November 3-4, 2015.
Abstract: Creating useful local context requires big data platforms and marketplaces. Contextual awareness is relevant to location based marketing, first responders, urban planners and many others. Location-aware mobile devices are revolutionizing how consumers and brands interact in the physical world. Situational awareness is a key element to efficiently handling any emergency response. In all cases, big data processing and high velocity streaming of location based data creates the richest contextual awareness. Data from many sources including IoT devices, sensor webs, surveillance and crowdsourcing are combined with semantically-rich urban and indoor data models. The resulting context information is delivered to and shared by mobile devices in connected and disconnected operations. Standards play a key role in establishing context platforms and marketplaces. Successful approaches will consolidate data from ubiquitous sensing technologies on a common space-time basis to enabled context-aware analysis of environmental and social dynamics.
Creating the golden record that makes every click personalJean-Michel Franco
This presentation shares real-world customer examples that illustrate how Master Data Management can make every customer interaction personal, including:
- Collect and reconcile customer data about identities, profiles, purchase history, preferences, and transactions
- Transform and augment this data into a 360° view of the customer with context, intentions, relationships, and interactions
- Turn data into insights with segments, scores, forecasts and recommendations
- Connect in real time to customer touch-points and turn those insights into increased conversion rates and customer loyalty
Building a MLOps Platform Around MLflow to Enable Model Productionalization i...Databricks
Getting machine learning models to production is notoriously difficult: it involves multiple teams (data scientists, data and machine learning engineers, operations, …), who often does not speak to each other very well; the model can be trained in one environment but then productionalized in completely different environment; it is not just about the code, but also about the data (features) and the model itself… At DataSentics, as a machine learning and cloud engineering studio, we see this struggle firsthand – on our internal projects and client’s projects as well.
FP&A with Spreadsheets and Spark with Oscar Castaneda-VillagranDatabricks
Financial Planning and Analysis teams often rely on spreadsheets for building data products that provide senior management with analysis and information that is crucial in decision-making. But spreadsheets do not scale and when it comes to expanding models FP&A analysts quickly hit a glass ceiling. The code that FP&A analysts write is in the form of Spreadsheet formulas.
In this talk I will show how Spreadsheet formulas and data can be automatically processed at scale inside a Spark cluster by the driver and worker nodes. Essentially this means running a Spreadsheet at scale inside your Spark cluster. I will show how Spreadsheets and their calculated outputs can be transformed into Data Frames for further processing with Spark.
We will also discuss next steps in FP&A data pipelines including AutoML and use of such pipelines for Data Science. The broader research topic is in the field of Model-Driven Data Product Design & Development which should be of interest to Spark Summit attendees who are looking for use cases and new opportunities to leverage existing corporate assets like Spreadsheets to automatically build working software that adds tremendous value at scale.
Data Science and Enterprise Engineering with Michael Finger and Chris RobisonDatabricks
How Data Scientists and Engineers work in tandem to achieve real-time personalization at Overstock
Personalizing online experiences for users is nothing new, but real-time personalization requires sub-second speed and close collaboration between data scientists and enterprise engineers.
Like the hands on a clock, data scientists and enterprise engineers have shifted their focus from hour- hand quickness to minute-hand speeds with a craving to take advantage of each tick of the second hand and personalize in real-time. Previously, daily activities were consumed on improving customers’ experiences tomorrow. Workflows ran overnight when on perm resources were not being tasked. The focus was on the-day-before jobs, always inching forward 24-hours behind.
Since then, we have shifted to hourly jobs and even to tasks that run every five minutes. Finally, we have been personalizing user experiences within the same day and even during the same session. But could we personalize these experiences instantly, immediately, and in real-time? What would that require? What does it look like? Michael Finger and Chris Robinson explore how data scientists and engineers are working in tandem to achieve real-time personalization at Overstock.com
Learn to Use Databricks for the Full ML LifecycleDatabricks
Machine learning development brings many new complexities beyond the traditional software development lifecycle. Unlike traditional software development, ML developers want to try multiple algorithms, tools and parameters to get the best results, and they need to track this information to reproduce work. In addition, developers need to use many distinct systems to productionize models. In this talk, learn how to operationalize ML across the full lifecycle with Databricks Machine Learning.
Unified Data Catalog - Recommendations powered by Apache Spark & Neo4jDeepak Chandramouli
Youtube | https://youtu.be/zGX0fRLdd6s?list=PLPaGQXwz_-RaoHicnGhL5SyOAp3_lUTQ2&t=1
This is a talk from PayPal at Nodes Online Summit, organized by Neo4j.
For more session details and video - please visit this link.
https://neo4j.com/online-summit/session/recommendations-unified-data-catalog-spark-neo4j
SnapLogic is a US-based, venture-funded software company that is attempting to reinvent integration platform technology - by creating one unified platform that can address many different kinds of application and data integration use cases.
CI/CD Templates: Continuous Delivery of ML-Enabled Data Pipelines on DatabricksDatabricks
Data & ML projects bring many new complexities beyond the traditional software development lifecycle. Unlike software projects, after they were successfully delivered and deployed, they cannot be abandoned but must be continuously monitored if model performance still satisfies all requirements. We can always get new data with new statistical characteristics that can break our pipelines or influence model performance. All these qualities of data & ML projects lead us to the necessity of continuous testing and monitoring of our models and pipelines.
Strata + Hadoop World: Jump Into the Data Lake with Hadoop-Scale Data Integra...SnapLogic
At last week's Strata + Hadoop World in San Jose, CA SnapLogic Chief Scientist Greg Benson talked to big data experts, data scientists and other enterprise IT leaders about the data lake and how SnapLogic comes into play with Hadoop-scale data integration.
Check out this presentation to learn how SnapLogic helps customers adopt Hadoop and automate data integration workflows.
To learn more, visit: www.snaplogic.com/big-data
Oracle Analytics Cloud empowers business analysts and consumers with modern, AI-powered, self-service analytics capabilities for data preparation, visualization, enterprise reporting, augmented analysis, and natural language processing/ generation. It is a single and complete platform that empowers your entire
Spark Usage in Enterprise Business OperationsSAP Technology
At Spark Summit East 2016, SAP’s Ken Tsai highlighted how SAP HANA Vora extends Apache Spark to provide OLAP modeling capabilities and real-time query federation to enterprise data. You will learn real-world use cases where instant insight from a combination of enterprise and Hadoop data make an impact on everyday business operations.
Splunk Ninjas: New Features, Pivot, and Search DojoSplunk
Besides seeing the newest features in Splunk Enterprise and learning the best practices for data models and pivot, we will show you how to use a handful of search commands that will solve most search needs. Learn these well and become a ninja.
Architecting Snowflake for High Concurrency and High PerformanceSamanthaBerlant
Cloud Data Warehousing juggernaut Snowflake has raced out ahead of the pack to deliver a data management platform from which a wealth of new analytics can be run. Using Snowflake as a traditional data warehouse has some obvious cost advantages over a hardware solution. But the real value of Snowflake as a data platform lies in its ability to support a high-concurrency analytics platform using Kyligence Cloud, powered by Apache Kylin.
In this presentation, Senior Solutions Architect Robert Hardaway will describe a modern data service architecture using precomputation and distributed indexes to provide interactive analytics to hundreds or even thousands of users running against very large Snowflake datasets (TBs to PBs).
Gimel is a data abstraction framework built on Apache Spark - providing unified Data Access via API & SQL to different technologies such as kafka, elastic, HBASE, Rest API, File, Object stores, Relational , etc.
We spoke about this recently in the "cloud track" in the "Scale By The Bay" Conference.
https://www.scale.bythebay.io/schedule
https://sched.co/e55D
Youtube - https://www.youtube.com/watch?v=cy8g2WZbEBI&ab_channel=FunctionalTV
https://youtu.be/m6_0iI4XDpU
Learning to Rank Datasets for Search with Oscar CastanedaDatabricks
Learning to rank methods automatically learn from user interaction instead of relying on labeled data prepared manually. Learning to rank, also referred to as machine-learned ranking, is an application of reinforcement learning concerned with building ranking models for information retrieval. Learning to rank has been successfully applied in building intelligent search engines, but has yet to show up in dataset search.
Dataset search is ripe for innovation with learning to rank specifically by automating the process of index construction. Oscar will recap previous presentations on dataset search and introduce learning to rank as a way to automate relevance scoring of dataset search results. He will also give a demo of a dataset search engine that makes use of an automatically constructed index using learning to rank on Elasticsearch and Spark.
Oscar will explain the motivation and use case of learning to rank in dataset search focusing on why it is interesting to rank datasets through machine-learned relevance scoring and how to improve indexing efficiency by tapping into user interaction data from clicks. Dataset Search and Learning to Rank are IR and ML topics that should be of interest to Spark Summit attendees who are looking for use cases and new opportunities to organize and rank Datasets in Data Lakes to make them searchable and relevant to users.
In preparation for this talk it is recommend that attendees watch previous two talks on dataset search from prior Spark Summit events as they build up to the present talk:
[1] https://spark-summit.org/east-2017/events/building-a-dataset-search-engine-with-spark-and-elasticsearch/
[2] https://spark-summit.org/eu-2016/events/spark-cluster-with-elasticsearch-inside/
Presentation Location and Context World, 2015. Palo Alto, CA November 3-4, 2015.
Abstract: Creating useful local context requires big data platforms and marketplaces. Contextual awareness is relevant to location based marketing, first responders, urban planners and many others. Location-aware mobile devices are revolutionizing how consumers and brands interact in the physical world. Situational awareness is a key element to efficiently handling any emergency response. In all cases, big data processing and high velocity streaming of location based data creates the richest contextual awareness. Data from many sources including IoT devices, sensor webs, surveillance and crowdsourcing are combined with semantically-rich urban and indoor data models. The resulting context information is delivered to and shared by mobile devices in connected and disconnected operations. Standards play a key role in establishing context platforms and marketplaces. Successful approaches will consolidate data from ubiquitous sensing technologies on a common space-time basis to enabled context-aware analysis of environmental and social dynamics.
Creating the golden record that makes every click personalJean-Michel Franco
This presentation shares real-world customer examples that illustrate how Master Data Management can make every customer interaction personal, including:
- Collect and reconcile customer data about identities, profiles, purchase history, preferences, and transactions
- Transform and augment this data into a 360° view of the customer with context, intentions, relationships, and interactions
- Turn data into insights with segments, scores, forecasts and recommendations
- Connect in real time to customer touch-points and turn those insights into increased conversion rates and customer loyalty
Many believe Big Data is a brand new phenomenon. It isn't, it is part of an evolution that reaches far back history. Here are some of the key milestones in this development.
Case Study: SocialCops + Tata Trusts in VijayawadaSocialCops
How the Tata Trusts, Government of Andhra Pradesh, MP Kesineni Srinivas, and the Centre for People's Forestry partnered with SocialCops to drive micro-targeted development through data for 264 villages in Vijayawada.
Building Smart Cities: The Data-Driven Way (Created For The Big 5 Construct 2...SocialCops
A presentation on data-driven solutions for creating smart cities. This presentation was made at The Big 5 Construct 2016 conference as a part of the panel on "Smart Development through Leveraging Technology". The presentation talks about the current landscape of smart city solutions, how cities and organizations can create smart cities, potential ways for the construction industry to use data while implementing smart city projects, and some relevant case studies from our experiences.
What exactly is big data? The definition of big data is data that contains greater variety, arriving in increasing volumes and with more velocity. This is also known as the three Vs. Put simply, big data is larger, more complex data sets, especially from new data sources.
Big Data Analysis : Deciphering the haystack Srinath Perera
A primary outcome of Bigdata is to derive useful and actionable insights from large or challenges data collections. The goal is to run the transformations from data, to information, to knowledge, and finally to insights. This includes calculating simple analytics like Mean, Max, and Median, to derive overall understanding about data by building models, and finally to derive predictions from data. Some cases we can afford to wait to collect and processes them, while in other cases we need to know the outputs right away. MapReduce has been the defacto standard for data processing, and we will start our discussion from there. However, that is only one side of the problem. There are other technologies like Apache Spark and Apache Drill graining ground, and also realtime processing technologies like Stream Processing and Complex Event Processing. Finally there are lot of work on porting decision technologies like Machine learning into big data landscape. This talk discusses big data processing in general and look at each of those different technologies comparing and contrasting them.
Big Data and Data Science: The Technologies Shaping Our LivesRukshan Batuwita
Big Data and Data Science have become increasingly imperative areas in both industry and academia to the extent that every company wants to hire a Data Scientist and every university wants to start dedicated degree programs and centres of excellence in Data Science. Big Data and Data Science have led to technologies that have already shaped different aspects of our lives such as learning, working, travelling, purchasing, social relationships, entertainments, physical activities, medical treatments, etc. This talk will attempt to cover the landscape of some of the important topics in these exponentially growing areas of Data Science and Big Data including the state-of-the-art processes, commercial and open-source platforms, data processing and analytics algorithms (specially large scale Machine Learning), application areas in academia and industry, the best industry practices, business challenges and what it takes to become a Data Scientist.
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureDATAVERSITY
Whether to take data ingestion cycles off the ETL tool and the data warehouse or to facilitate competitive Data Science and building algorithms in the organization, the data lake – a place for unmodeled and vast data – will be provisioned widely in 2020.
Though it doesn’t have to be complicated, the data lake has a few key design points that are critical, and it does need to follow some principles for success. Avoid building the data swamp, but not the data lake! The tool ecosystem is building up around the data lake and soon many will have a robust lake and data warehouse. We will discuss policy to keep them straight, send data to its best platform, and keep users’ confidence up in their data platforms.
Data lakes will be built in cloud object storage. We’ll discuss the options there as well.
Get this data point for your data lake journey.
Accelerate Enterprise Software Engineering with PlatformlessWSO2
Key takeaways:
Challenges of building platforms and the benefits of platformless.
Key principles of platformless, including API-first, cloud-native middleware, platform engineering, and developer experience.
How Choreo enables the platformless experience.
How key concepts like application architecture, domain-driven design, zero trust, and cell-based architecture are inherently a part of Choreo.
Demo of an end-to-end app built and deployed on Choreo.
Less Is More: Utilizing Ballerina to Architect a Cloud Data PlatformWSO2
At its core, the challenge of managing Human Resources data is an integration challenge: estimates range from 2-3 HR systems in use at a typical SMB, up to a few dozen systems implemented amongst enterprise HR departments, and these systems seldom integrate seamlessly between themselves. Providing a multi-tenant, cloud-native solution to integrate these hundreds of HR-related systems, normalize their disparate data models and then render that consolidated information for stakeholder decision making has been a substantial undertaking, but one significantly eased by leveraging Ballerina. In this session, we’ll cover:
The overall software architecture for VHR’s Cloud Data Platform
Critical decision points leading to adoption of Ballerina for the CDP
Ballerina’s role in multiple evolutionary steps to the current architecture
Roadmap for the CDP architecture and plans for Ballerina
WSO2’s partnership in bringing continual success for the CD
The integration landscape is changing rapidly with the introduction of technologies like GraphQL, gRPC, stream processing, iPaaS, and platformless. However, not all existing applications and industries can keep up with these new technologies. Certain industries, like manufacturing, logistics, and finance, still rely on well-established EDI-based message formats. Some applications use XML or CSV with file-based communications, while others have strict on premises deployment requirements. This talk focuses on how Ballerina's built-in integration capabilities can bridge the gap between "old" and "new" technologies, modernizing enterprise applications without disrupting business operations.
Platformless Horizons for Digital AdaptabilityWSO2
In this keynote, Asanka Abeysinghe, CTO,WSO2 will explore the shift towards platformless technology ecosystems and their importance in driving digital adaptability and innovation. We will discuss strategies for leveraging decentralized architectures and integrating diverse technologies, with a focus on building resilient, flexible, and future-ready IT infrastructures. We will also highlight WSO2's roadmap, emphasizing our commitment to supporting this transformative journey with our evolving product suite.
Quantum computers are rapidly evolving and are promising significant advantages in domains like machine learning or optimization, to name but a few areas. In this keynote we sketch the underpinnings of quantum computing, show some of the inherent advantages, highlight some application areas, and show how quantum applications are built.
WSO2CON 2024 - Designing Event-Driven Enterprises: Stories of Transformation
Building your big data solution
1. Learn
with
WSO2
-‐
Building
your
Big
Data
Solu8on
Srinath
Perera
Director
of
Research
WSO2
Inc.
2. About WSO2
• Providing the only complete open source componentized
cloud platform
– Dedicated to removing all the stumbling blocks to enterprise agility
– Enabling you to focus on business logic and business value
• Recognized by leading analyst firms as visionaries and
leaders
– Gartner cites WSO2 as visionaries in all 3 categories of
application infrastructure
– Forrester places WSO2 in top 2 for API Management
• Global corporation with offices in USA, UK & Sri Lanka
– 200+ employees and growing
• Business model of selling comprehensive support &
maintenance for our products
4. Consider
a
day
in
your
life
• What
is
the
best
road
to
take?
• Would
there
be
any
bad
weather?
• What
is
the
best
way
to
invest
the
money?
• Should
I
take
that
loan?
• Can
I
op8mize
my
day?
• Is
there
a
way
to
do
this
faster?
• What
have
others
done
in
similar
cases?
• Which
product
should
I
buy?
5. People
wanted
to
(through
ages)
• To
know
(what
happened?)
• To
Explain
(why
it
happened)
• To
Predict
(what
will
happen?)
6. What
is
Big
data?
• There
is
lot
of
data
available
– E.g.
Internet
of
things
• We
have
compu8ng
power
• We
have
technology
• Goal
is
same
– To
know
– To
Explain
– To
predict
• Challenge
is
the
full
lifecycle
8. Data
Avalanche/
Moore’s
law
of
data
• We
are
now
collec8ng
and
conver8ng
large
amount
of
data
to
digital
forms
• 90%
of
the
data
in
the
world
today
was
created
within
the
past
two
years.
• Amount
of
data
we
have
doubles
very
fast
9. In
real
life,
most
data
are
Big
• Web
does
millions
of
ac8vi8es
per
second,
and
so
much
server
logs
are
created.
• Social
networks
e.g.
Facebook,
800
Million
ac8ve
users,
40
billion
photos
from
its
user
base.
• There
are
>4
billion
phones
and
>25%
are
smart
phones.
There
are
billions
of
RFID
tags.
• Observa8onal
and
Sensor
data
– Weather
Radars,
Balloons
– Environmental
Sensors
– Telescopes
– Complex
physics
simula8ons
10. Why
Big
Data
is
hard?
• How
store?
Assuming
1TB
bytes
it
takes
1000
computers
to
store
a
1PB
• How
to
move?
Assuming
10Gb
network,
it
takes
2
hours
to
copy
1TB,
or
83
days
to
copy
a
1PB
• How
to
search?
Assuming
each
record
is
1KB
and
one
machine
can
process
1000
records
per
sec,
it
needs
277CPU
days
to
process
a
1TB
and
785
CPU
years
to
process
a
1
PB
• How
to
process?
– How
to
convert
algorithms
to
work
in
large
size
– How
to
create
new
algorithms
hap://www.susanica.com/photo/9
11. Why
it
is
hard
(Contd.)?
• System
build
of
many
computers
• That
handles
lots
of
data
• Running
complex
logic
• This
pushes
us
to
fron8er
of
Distributed
Systems
and
Databases
• More
data
does
not
mean
there
is
a
simple
model
• Some
models
can
be
complex
as
the
system
hap://www.flickr.com/photos/mariachily/5250487136,
Licensed
CC
13. WSO2
Offerings
• Two
tools
– WSO2
BAM
for
store
and
process
– WSO2
CEP
for
real8me
processing
• These
tools
covers
whole
processing
life
cycle
for
your
Big
Data
with
help
of
few
other
products
as
needed.
– WSO2
Storage
server
– WSO2
User
Experience
Server
15. Sensors
• Built
sensors
in
WSO2
Products
• Event
logs
– Click
streams,
Emails,
chat,
search,
tweets
,Transac8ons
…
• Custom
Sensors
– Video
surveillance,
Cash
flows,
Traffic,
Surveillance,
Smart
Grid,
Produc8on
line,
RFID
(e.g.
Walmart),
GPS
sensors,
Mobile
Phone,
Internet
of
Things
hap://www.flickr.com/photos/imuaoo/4257813689/
by
Ian
Muaoo,
hap://www.flickr.com/photos/eastcapital/4554220770/,
hap://www.flickr.com/
photos/patdavid/4619331472/
by
Pat
David
copyright
CC
16. Collec8ng
Data
• Data
collected
at
sensors
and
sent
to
big
data
system
via
events
or
flat
files
• Event
Streams:
we
name
the
events
by
its
content/
originator
• Get
data
through
– Point
to
Point
– Event
Bus
• E.g.
Data
bridge
– a
thrij
based
transport
we
did
that
do
about
400k
events/
sec
17. Storing
Data
• Historically
we
used
databases
– Scale
is
a
challenge:
replica8on,
sharding
• Scalable
op8ons
– NoSQL
(Cassandra,
Hbase)
[If
data
is
structured]
• Column
families
Gaining
Ground
– Distributed
file
systems
(e.g.
HDFS)
[If
data
is
unstructured]
• New
SQL
– In
Memory
compu8ng,
VoltDB
• Specialized
data
structures
– Graph
Databases,
Data
structure
servers
hap://www.flickr.com/photos/keso/
363133967/
18. Storing
Data
(Contd.)
• WSO2
Offerings
(WSO2
Storage
Server)
– Small
Structured
Data:
keep
in
rela8onal
databases.
– Large
structured
data
:
Cassandra
– Large
unstructured
data:
HDFS
19. Making
Sense
of
Data
• To
know
(what
happened?)
– Basic
analy8cs
+
visualiza8ons
(min,
max,
average,
histogram,
distribu8ons
…
)
– Interac8ve
drill
down
• To
explain
(why)
– Data
mining,
classifica8ons,
building
models,
clustering
• To
forecast
– Neural
networks,
decision
models
20. Making
Sense
of
Data
(Contd.)
• Batch
processing
–
WSO2
BAM
– Hive
Scripts
– Map
Reduce
Jobs
• Real
8me
processing
–
CEP
– Event
Query
Language
• Above
two
are
the
plarorm,
you
need
to
program
your
usecase.
21. To
know
(what
happened?)
• Mainly
Analy8cs
– Min,
Max,
average,
correla8on,
histograms
– Might
join
group
data
in
many
ways
• Implemented
with
MapReduce
or
Queries
• Data
is
ojen
presented
with
some
visualiza8ons
• Examples
–
forensics
– Assessments
– Historical
data/
reports/
trends
hap://www.flickr.com/photos/isriya/
2967310333/
22. To
Explain
(Paaerns)
• Correla8on
– Scaaer
plot,
sta8s8cal
correla8on
• Data
Mining
(Detec8ng
Paaerns)
– Clustering
and
classifica8on
– Finding
Similar
items
– Finding
Hubs
and
authori8es
in
a
Graph
– Finding
frequent
item
sets
– Making
recommenda8on
• Apache
Mahout
hap://www.flickr.com/photos/eriwst/2987739376/
and
hap://www.flickr.com/photos/focx/5035444779/
23. To
Predict:
Forecasts
and
Models
• Trying
to
build
a
model
for
the
data
• Theore8cally
or
empirically
– Analy8cal
models
(e.g.
Physics)
– Neural
networks
– Reinforcement
learning
– Unsupervised
learning
(clustering,
dimensionality
reduc8on,
kernel
methods)
• Examples
– Transla8on
– Weather
Forecast
models
– Building
profiles
of
users
– Traffic
models
– Economic
models
• Lot
of
domain
specific
work
hap://misterbijou.blogspot.com/
2010_09_01_archive.html
24. Informa8on
Visualiza8on
• Presen8ng
informa8on
– To
end
user
– To
decision
takers
– To
scien8st
• Interac8ve
explora8on
• Sending
alerts
• WSO2
UES
– Jaggery
based
• BAM/
CEP
can
Work
with
most
other
UI
tools
hap://www.flickr.com/photos/
stevefaeembra/3604686097/
25. WSO2
UES
• Dashboards,
and
Store
• Build
your
own
Uis
with
Jaggery
26. MapReduce/
Hadoop
• First
introduced
by
Google,
and
used
as
the
processing
model
for
their
architecture
• Implemented
by
opensource
projects
like
Apache
Hadoop
and
Spark
• Users
writes
two
func8ons:
map
and
reduce
• The
framework
handles
the
details
like
distributed
processing,
fault
tolerance,
load
balancing
etc.
• Widely
used,
and
the
one
of
the
catalyst
of
Big
data
void map(ctx, k, v){
tokens = v.split();
for t in tokens
ctx.emit(t,1)
}
void reduce(ctx, k, values[]){
count = 0;
for v in values
count = count + v;
ctx.emit(k,count);
}
28. Data
In
the
Move
• Idea
is
to
process
data
as
they
are
received
in
streaming
fashion
• Used
when
we
need
– Very
fast
output
– Lots
of
events
(few
100k
to
millions)
– Processing
without
storing
(e.g.
too
much
data)
• Two
main
technologies
– Stream
Processing
(e.g.
Strom,
hap://storm-‐project.net/
)
– Complex
Event
Processing
(CEP)
hap://wso2.com/products/
complex-‐event-‐processor/
29. Complex
Event
Processing
(CEP)
• Sees
inputs
as
Event
streams
and
queried
with
SQL
like
language
• Supports
Filters,
Windows,
Join,
Paaerns
and
Sequences
from p=PINChangeEvents#win.time(3600) join
t=TransactionEvents[p.custid=custid][amount>10000]
#win.time(3600)
return t.custid, t.amount;
31. Case
Study
1:
Tracing
Business
Process
• Business
process
is
built
using
many
services
• Track
trace
each
step,
and
analyze
to
understand
how
to
op8mize
• E.g.
sales
pipeline
32. Some
Queries
• Conversion
rate?
• How
many
deals
in
pipeline
at
each
month?
• Average
size
of
the
deals?
• Average
8me
deal
takes?
• Can
we
guess
an
large
size
deals
early?
• Which
is
beaer?
Going
for
few
large
ones
or
many
small
ones?
• Was
there
any
delays
from
Ourside?
33. Hive:
Average
Size
of
the
Deal
• Hive
uses
an
SQL
like
synatax.
• Easy
to
understand
and
learn
hive> LOAD DATA ..
hive> SELECT avg(value) from LEAD_ACTIVITY
WHERE action=“closedWon” groupby month;
35. How
many
deals
in
Pipeline?(Contd.)
void map(ctx, k, v){
Deals deal= parse(v);
int month = getMonth(deal.time);
ctx.emit(month,1)
}
void reduce(ctx, k, values[]){
count = 0;
for v in values
count = count + v;
ctx.emit(k,count);
}
36. Case
study
2:
DEBS
Challenge
• Event
Processing
challenge
• Real
football
game,
sensors
in
player
shoes
+
ball
• Events
in
15k
Hz
• Event
format
– Sensor
ID,
TS,
x,
y,
z,
v,
a
• Queries
– Running
Stats
– Ball
Possession
– Heat
Map
of
Ac8vity
– Shots
at
Goal
37. Example:
Detect
ball
Possession
• Possession
is
8me
a
player
hit
the
ball
un8l
someone
else
hits
it
or
it
goes
out
of
the
ground
from Ball#window.length(1) as b join
Players#window.length(1) as p
unidirectional
on debs: getDistance(b.x,b.y,b.z,
p.x, p.y, p.z) < 1000
and b.a > 55
select ...
insert into hitStream
from old = hitStream ,
b = hitStream [old. pid != pid ],
n= hitStream[b.pid == pid]*,
( e1 = hitStream[b.pid != pid ]
or e2= ballLeavingHitStream)
select ...
insert into BallPossessionStream
hap://www.flickr.com/photos/glennharper/146164820/
38. Conclusions
• What
is
Big
Data?
• Big
Data
Architecture
– Collec8ng
data
– Storing
data
– Processing
Data
• WSO2
Offerings
• Case
Studies
40. Engage with WSO2
• Helping you get the most out of your deployments
• From project evaluation and inception to development
and going into production, WSO2 is your partner in
ensuring 100% project success