Building Enterprise OLAP on Hadoop for Finance Services Industry, and following a use case of CPIC (fortune 500 insurance company) about how to replace legacy IBM Cognos OLAP with Kyligence platform
Apache Kylin and Use Cases - 2018 Big Data SpainLuke Han
Apache Kylin is rapidly being adopted over the world as the leading open source OLAP for Big Data. In this topic, Luke Han, creator and PMC chair of Apache Kylin, will introduce the motivation when build this project and technical highlights, alwo will explore how various industries use Apache Kylin, and the resulting business impact.
The Apache Way - Building Open Source Community in China - Luke HanLuke Han
My presentation at ApacheCon 2016 NA, talking about our practices to build open source community (Apache Kylin) in China, about the challenge, the culture different, the language and so on.
Also have a overview about Open Source in China, about the changing happening now there.
It's good reference for people have interesting to extend their community in China, to engage more Chinese even Asia developers to double their open source community and adoption.
Pivotal is a trusted partner for IT innovation and transformation. From the technology, to the people, to the way people interact with technology, Pivotal is transforming how the world builds software.
At Strata NYC 2015, Pivotal, announced it will Supercharge the Hadoop Ecosystem by contributing the HAWQ advanced SQL on Hadoop analytics and MADlib machine learning technologies to The Apache Software Foundation.
Apache Kylin and Use Cases - 2018 Big Data SpainLuke Han
Apache Kylin is rapidly being adopted over the world as the leading open source OLAP for Big Data. In this topic, Luke Han, creator and PMC chair of Apache Kylin, will introduce the motivation when build this project and technical highlights, alwo will explore how various industries use Apache Kylin, and the resulting business impact.
The Apache Way - Building Open Source Community in China - Luke HanLuke Han
My presentation at ApacheCon 2016 NA, talking about our practices to build open source community (Apache Kylin) in China, about the challenge, the culture different, the language and so on.
Also have a overview about Open Source in China, about the changing happening now there.
It's good reference for people have interesting to extend their community in China, to engage more Chinese even Asia developers to double their open source community and adoption.
Pivotal is a trusted partner for IT innovation and transformation. From the technology, to the people, to the way people interact with technology, Pivotal is transforming how the world builds software.
At Strata NYC 2015, Pivotal, announced it will Supercharge the Hadoop Ecosystem by contributing the HAWQ advanced SQL on Hadoop analytics and MADlib machine learning technologies to The Apache Software Foundation.
Analytics at the Speed of Thought: Actian Express Overview Actian Corporation
Deliver faster insight – reduce query response times to seconds
Analyze more data faster – explore billions of rows of data in seconds
More concurrent users – enable more concurrent BI users to explore more data
DeepLearning is not just a hype - it outperforms state-of-the-art ML algorithms. One by one. In this talk we will show how DeepLearning can be used for detecting anomalies on IoT sensor data streams at high speed using DeepLearning4J on top of different BigData engines like ApacheSpark and ApacheFlink. Key in this talk is the absence of any large training corpus since we are using unsupervised machine learning - a domain current DL research threats step-motherly. As we can see in this demo LSTM networks can learn very complex system behavior - in this case data coming from a physical model simulating bearing vibration data. Once draw back of DeepLearning is that normally a very large labaled training data set is required. This is particularly interesting since we can show how unsupervised machine learning can be used in conjunction with DeepLearning - no labeled data set is necessary. We are able to detect anomalies and predict braking bearings with 10 fold confidence. All examples and all code will be made publicly available and open sources. Only open source components are used.
Best Practices for Supercharging Cloud Analytics on Amazon RedshiftSnapLogic
In this webinar, we discuss how the secret sauce to your business analytics strategy remains rooted on your approached, methodologies and the amount of data incorporated into this critical exercise. We also address best practices to supercharge your cloud analytics initiatives, and tips and tricks on designing the right information architecture, data models and other tactical optimizations.
To learn more, visit: http://www.snaplogic.com/redshift-trial
InfoTrack: Creating a single source of truth with the Elastic StackElasticsearch
Ashim Joshi, Head of Innovation at InfoTrack, will discuss how the Elasticsearch Service helped tackle a variety of uses cases at Infotrack, like building a data-lake, and architecting a data-mart layer.
See the video: https://www.elastic.co/elasticon/tour/2019/sydney/infotrack-creating-a-single-source-of-truth-with-the-elastic-stack
Pivotal Big Data Suite: A Technical OverviewVMware Tanzu
How and why are companies like Uber, Netflix and Airbnb so successful, what you need to in order to become successful in the same way that they are and how Pivotal can help you with that.
Speaker: Les Klein, EMEA CTO Data, Pivotal
Northwestern Mutual Journey – Transform BI Space to CloudDatabricks
The volume of available data is growing by the second (to an estimated 175 zetabytes by 2025), and it is becoming increasingly granular in its information. With that change every organization is moving towards building a data driven culture. We at Northwestern Mutual share similar story of driving towards making data driven decisions to improve both efficiency and effectiveness. Legacy system analysis revealed bottlenecks, excesses, duplications etc. Based on ever growing need to analyze more data our BI Team decided to make a move to more modern, scalable, cost effective data platform. As a financial company, data security is as important as ingestion of data. In addition to fast ingestion and compute we would need a solution that can support column level encryption, Role based access to different teams from our datalake.
In this talk we describe our journey to move 100’s of ELT jobs from current MSBI stack to Databricks and building a datalake (using Lakehouse). How we reduced our daily data load time from 7 hours to 2 hours with capability to ingest more data. Share our experience, challenges, learning, architecture and design patterns used while undertaking this huge migration effort. Different sets of tools/frameworks built by our engineers to help ease the learning curve that our non-Apache Spark engineers would have to go through during this migration. You will leave this session with more understand on what it would mean for you and your organization if you are thinking about migrating to Apache Spark/Databricks.
Weathering the Data Storm – How SnapLogic and AWS Deliver Analytics in the Cl...SnapLogic
In this webinar, learn how SnapLogic and Amazon Web Services helped Earth Networks create a responsive, self-service cloud for data integration, preparation and analytics.
We also discuss how Earth Networks gained faster data insights using SnapLogic’s Amazon Redshift data integration and other connectors to quickly integrate, transfer and analyze data from multiple applications.
To learn more, visit: www.snaplogic.com/redshift
Cloud Experience: Data-driven Applications Made Simple and FastDatabricks
A complex real-time data workflow implementation is very challenging. This session will describe the architecture of a data platform that provides a single, secure, high-performance system that can be deployed in a hybrid cloud architectures. We will present how to support simultaneous, consistent and high-performance access through multiple industry open source and cloud compatible standards of streaming, table, TSDB, object, and file APIs. A new serverless technology is also used in the architecture to support a dynamic and flexible implementations. The presenter will also outline how the platform was integrated with the Spark eco-system, including AI and ML tools, to simplify the development process
Life occurs in real-time, and not surprisingly, more solutions are being built using streaming technologies. Event-based architectures are becoming the norm, and customers are expecting immediate access to their data. This new world offers many exciting opportunities, but also some new challenges. What do you do when your streaming data is not complete? What if it relies on another data source? Does the dependent data exist yet, and does it come from a 3rd party? How do we merge a complete picture of data when data is sourcing from multiple places at the same time? A new norm in the world of distributed services. Join us as we dive deep into the technical details around these scenarios and more. Expect to learn about stream-stream joins, enriching stream data using local or remote data, and ways to anticipate and correct errors within the stream. Leave with a better understanding of managing data dependencies within a Spark Structured Streaming application.
Virgin Hyperloop One is the leader in realizing a Hyperloop mass transportation system (VHOMTS), which will bring the cities and people closer together than ever before while reducing pollution, emission of greenhouse gases, transit times, etc. To build a safe and user friendly Hyperloop, we need to answer key technical and business questions, including: – ‘What is the safe maximum speed the hyperloop can go?’ – ‘How many pods (the vehicles that carry people) do we need to fulfill a given demand?’
How to Operationalise Real-Time Hadoop in the CloudAttunity
Hadoop and the Cloud are two of the most disruptive technologies to have emerged from the last decade, but how can you adapt to the increasing rate of change whilst providing the enterprise with the right data, quickly?
Watch this webinar with Attunity, Cloudera and Microsoft and learn:
-How to ingest the most valuable enterprise data into Hadoop
-About real life use cases of Cloudera on Azure
-How to combine the power of Hadoop and the scalable flexibility of Azure
Enable your business with more data in less time. Visit www.attunity.com for more information.
Getting Into the Business Intelligence Game: Migrating OBIA to the CloudDatavail
This presentation discusses best practice architecture for migrating the Oracle BI Applications to the cloud. It focuses on the Oracle cloud platform and database services, with a nod to infrastructure services, to lay out the idea of the hybrid cloud, and variations of the new age cloud BI/DW architecture for your analytics environment to succeed while operating at the same reliability or better all the while benefiting from what the cloud offers best.
PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...Deepak Chandramouli
PayPal Data Lake Journey | 2017-Oct | San Diego | Teradata Edge of Next
Gimel [http://www.gimel.io] is a Big Data Processing Library, open sourced by PayPal.
https://www.youtube.com/watch?v=52PdNno_9cU&t=3s
Gimel empowers analysts, scientists, data engineers alike to access a variety of Big Data / Traditional Data Stores - with just SQL or a single line of code (Unified Data API).
This is possible via the Catalog of Technical properties abstracted from users, along with a rich collection of Data Store Connectors available in Gimel Library.
A Catalog provider can be Hive or User Supplied (runtime) or UDC.
In addition, PayPal recently open sourced UDC [Unified Data Catalog], which can host and serve the Technical Metatada of the Data Stores & Objects. Visit http://www.unifieddatacatalog.io to experience first hand.
Fast Cars, Big Data - How Streaming Can Help Formula 1 - Tugdual Grall - Code...Codemotion
Modern race cars produce lot of data, and all this in real time. In this presentation I will show you how data could be generated and used by various applications in the car, on the track or team head quarter. The demonstration will show how to move data using messaging systems like Apache Kafka, process the data using Apache Spark and use various storage technics: Distributed File System, NoSQL Database. This presentation is a great opportunity to see how to build a " near real time big data application". The code from this talk will be made available as open source.
Offload, Transform, and Present - The New World of Data Integrationgluent.
This session explores how one organization built an integrated analytics platform by implementing Gluent to offload its Oracle enterprise data warehouse (EDW) data to Hadoop, and to transparently present native Hadoop data back to its EDW. As a result of its efforts, the company is now able to support operational reporting, OLAP, data discovery, predictive analytics, and machine learning from a single scalable platform that combines the benefits of an enterprise data warehouse with those of a data lake. This session includes a brief overview of the platform and use cases to demonstrate how the company has utilized the solution to provide business value.
Cloud-native Semantic Layer on Data LakeDatabricks
With larger volume and more real-time data stored in data lake, it becomes more complex to manage these data and serve analytics and applications. With different service interfaces, data caliber, performance bias on different scenarios, the business users begin to suffer low confidence on quality and efficiency to get insight from data.
Lightning-Fast, Interactive Business Intelligence Performance with MicroStrat...Tyler Wishnoff
See how extreme query speeds and ultra-high concurrency on MicroStrategy, and any other business intelligence (BI) tool, on Big Data is possible through the Kyligence platform. Learn more here: https://kyligence.io/
Analytics at the Speed of Thought: Actian Express Overview Actian Corporation
Deliver faster insight – reduce query response times to seconds
Analyze more data faster – explore billions of rows of data in seconds
More concurrent users – enable more concurrent BI users to explore more data
DeepLearning is not just a hype - it outperforms state-of-the-art ML algorithms. One by one. In this talk we will show how DeepLearning can be used for detecting anomalies on IoT sensor data streams at high speed using DeepLearning4J on top of different BigData engines like ApacheSpark and ApacheFlink. Key in this talk is the absence of any large training corpus since we are using unsupervised machine learning - a domain current DL research threats step-motherly. As we can see in this demo LSTM networks can learn very complex system behavior - in this case data coming from a physical model simulating bearing vibration data. Once draw back of DeepLearning is that normally a very large labaled training data set is required. This is particularly interesting since we can show how unsupervised machine learning can be used in conjunction with DeepLearning - no labeled data set is necessary. We are able to detect anomalies and predict braking bearings with 10 fold confidence. All examples and all code will be made publicly available and open sources. Only open source components are used.
Best Practices for Supercharging Cloud Analytics on Amazon RedshiftSnapLogic
In this webinar, we discuss how the secret sauce to your business analytics strategy remains rooted on your approached, methodologies and the amount of data incorporated into this critical exercise. We also address best practices to supercharge your cloud analytics initiatives, and tips and tricks on designing the right information architecture, data models and other tactical optimizations.
To learn more, visit: http://www.snaplogic.com/redshift-trial
InfoTrack: Creating a single source of truth with the Elastic StackElasticsearch
Ashim Joshi, Head of Innovation at InfoTrack, will discuss how the Elasticsearch Service helped tackle a variety of uses cases at Infotrack, like building a data-lake, and architecting a data-mart layer.
See the video: https://www.elastic.co/elasticon/tour/2019/sydney/infotrack-creating-a-single-source-of-truth-with-the-elastic-stack
Pivotal Big Data Suite: A Technical OverviewVMware Tanzu
How and why are companies like Uber, Netflix and Airbnb so successful, what you need to in order to become successful in the same way that they are and how Pivotal can help you with that.
Speaker: Les Klein, EMEA CTO Data, Pivotal
Northwestern Mutual Journey – Transform BI Space to CloudDatabricks
The volume of available data is growing by the second (to an estimated 175 zetabytes by 2025), and it is becoming increasingly granular in its information. With that change every organization is moving towards building a data driven culture. We at Northwestern Mutual share similar story of driving towards making data driven decisions to improve both efficiency and effectiveness. Legacy system analysis revealed bottlenecks, excesses, duplications etc. Based on ever growing need to analyze more data our BI Team decided to make a move to more modern, scalable, cost effective data platform. As a financial company, data security is as important as ingestion of data. In addition to fast ingestion and compute we would need a solution that can support column level encryption, Role based access to different teams from our datalake.
In this talk we describe our journey to move 100’s of ELT jobs from current MSBI stack to Databricks and building a datalake (using Lakehouse). How we reduced our daily data load time from 7 hours to 2 hours with capability to ingest more data. Share our experience, challenges, learning, architecture and design patterns used while undertaking this huge migration effort. Different sets of tools/frameworks built by our engineers to help ease the learning curve that our non-Apache Spark engineers would have to go through during this migration. You will leave this session with more understand on what it would mean for you and your organization if you are thinking about migrating to Apache Spark/Databricks.
Weathering the Data Storm – How SnapLogic and AWS Deliver Analytics in the Cl...SnapLogic
In this webinar, learn how SnapLogic and Amazon Web Services helped Earth Networks create a responsive, self-service cloud for data integration, preparation and analytics.
We also discuss how Earth Networks gained faster data insights using SnapLogic’s Amazon Redshift data integration and other connectors to quickly integrate, transfer and analyze data from multiple applications.
To learn more, visit: www.snaplogic.com/redshift
Cloud Experience: Data-driven Applications Made Simple and FastDatabricks
A complex real-time data workflow implementation is very challenging. This session will describe the architecture of a data platform that provides a single, secure, high-performance system that can be deployed in a hybrid cloud architectures. We will present how to support simultaneous, consistent and high-performance access through multiple industry open source and cloud compatible standards of streaming, table, TSDB, object, and file APIs. A new serverless technology is also used in the architecture to support a dynamic and flexible implementations. The presenter will also outline how the platform was integrated with the Spark eco-system, including AI and ML tools, to simplify the development process
Life occurs in real-time, and not surprisingly, more solutions are being built using streaming technologies. Event-based architectures are becoming the norm, and customers are expecting immediate access to their data. This new world offers many exciting opportunities, but also some new challenges. What do you do when your streaming data is not complete? What if it relies on another data source? Does the dependent data exist yet, and does it come from a 3rd party? How do we merge a complete picture of data when data is sourcing from multiple places at the same time? A new norm in the world of distributed services. Join us as we dive deep into the technical details around these scenarios and more. Expect to learn about stream-stream joins, enriching stream data using local or remote data, and ways to anticipate and correct errors within the stream. Leave with a better understanding of managing data dependencies within a Spark Structured Streaming application.
Virgin Hyperloop One is the leader in realizing a Hyperloop mass transportation system (VHOMTS), which will bring the cities and people closer together than ever before while reducing pollution, emission of greenhouse gases, transit times, etc. To build a safe and user friendly Hyperloop, we need to answer key technical and business questions, including: – ‘What is the safe maximum speed the hyperloop can go?’ – ‘How many pods (the vehicles that carry people) do we need to fulfill a given demand?’
How to Operationalise Real-Time Hadoop in the CloudAttunity
Hadoop and the Cloud are two of the most disruptive technologies to have emerged from the last decade, but how can you adapt to the increasing rate of change whilst providing the enterprise with the right data, quickly?
Watch this webinar with Attunity, Cloudera and Microsoft and learn:
-How to ingest the most valuable enterprise data into Hadoop
-About real life use cases of Cloudera on Azure
-How to combine the power of Hadoop and the scalable flexibility of Azure
Enable your business with more data in less time. Visit www.attunity.com for more information.
Getting Into the Business Intelligence Game: Migrating OBIA to the CloudDatavail
This presentation discusses best practice architecture for migrating the Oracle BI Applications to the cloud. It focuses on the Oracle cloud platform and database services, with a nod to infrastructure services, to lay out the idea of the hybrid cloud, and variations of the new age cloud BI/DW architecture for your analytics environment to succeed while operating at the same reliability or better all the while benefiting from what the cloud offers best.
PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...Deepak Chandramouli
PayPal Data Lake Journey | 2017-Oct | San Diego | Teradata Edge of Next
Gimel [http://www.gimel.io] is a Big Data Processing Library, open sourced by PayPal.
https://www.youtube.com/watch?v=52PdNno_9cU&t=3s
Gimel empowers analysts, scientists, data engineers alike to access a variety of Big Data / Traditional Data Stores - with just SQL or a single line of code (Unified Data API).
This is possible via the Catalog of Technical properties abstracted from users, along with a rich collection of Data Store Connectors available in Gimel Library.
A Catalog provider can be Hive or User Supplied (runtime) or UDC.
In addition, PayPal recently open sourced UDC [Unified Data Catalog], which can host and serve the Technical Metatada of the Data Stores & Objects. Visit http://www.unifieddatacatalog.io to experience first hand.
Fast Cars, Big Data - How Streaming Can Help Formula 1 - Tugdual Grall - Code...Codemotion
Modern race cars produce lot of data, and all this in real time. In this presentation I will show you how data could be generated and used by various applications in the car, on the track or team head quarter. The demonstration will show how to move data using messaging systems like Apache Kafka, process the data using Apache Spark and use various storage technics: Distributed File System, NoSQL Database. This presentation is a great opportunity to see how to build a " near real time big data application". The code from this talk will be made available as open source.
Offload, Transform, and Present - The New World of Data Integrationgluent.
This session explores how one organization built an integrated analytics platform by implementing Gluent to offload its Oracle enterprise data warehouse (EDW) data to Hadoop, and to transparently present native Hadoop data back to its EDW. As a result of its efforts, the company is now able to support operational reporting, OLAP, data discovery, predictive analytics, and machine learning from a single scalable platform that combines the benefits of an enterprise data warehouse with those of a data lake. This session includes a brief overview of the platform and use cases to demonstrate how the company has utilized the solution to provide business value.
Cloud-native Semantic Layer on Data LakeDatabricks
With larger volume and more real-time data stored in data lake, it becomes more complex to manage these data and serve analytics and applications. With different service interfaces, data caliber, performance bias on different scenarios, the business users begin to suffer low confidence on quality and efficiency to get insight from data.
Lightning-Fast, Interactive Business Intelligence Performance with MicroStrat...Tyler Wishnoff
See how extreme query speeds and ultra-high concurrency on MicroStrategy, and any other business intelligence (BI) tool, on Big Data is possible through the Kyligence platform. Learn more here: https://kyligence.io/
Take the Bias out of Big Data Insights With Augmented AnalyticsTyler Wishnoff
Is bias impacting your Big Data insights? Learn how augmented analytics and the latest advancements in OLAP technology are making analytics (including on cloud) from business intelligence, data science, and machine learning more accurate and impactful. Learn more at https://kyligence.io
Integrating and fully utilizing data is a critical prerequisite for ensuring the success of data-driven operations and decision making. This is especially true as more and more corporations begin transforming legacy data warehouses and transitioning to the Cloud. See how Augmented OLAP technology is leading the way in streamlining Big Data analytics on the Cloud with this presentation by Kyligence CEO Luke Han at Big Things Conference 2019. Learn more here: https://kyligence.io
Accelerating Big Data Analytics with Apache KylinTyler Wishnoff
Learn about the latest advancements in Apache Kylin and how its OLAP technology is making analytics faster and insights more actionable.
Learn more about Apache Kylin: https://kyligence.io/apache-kylin-overview/
Learn more about Apache Kylin's enterprise version Kyligence: https://kyligence.io/
Architecting Snowflake for High Concurrency and High PerformanceSamanthaBerlant
Cloud Data Warehousing juggernaut Snowflake has raced out ahead of the pack to deliver a data management platform from which a wealth of new analytics can be run. Using Snowflake as a traditional data warehouse has some obvious cost advantages over a hardware solution. But the real value of Snowflake as a data platform lies in its ability to support a high-concurrency analytics platform using Kyligence Cloud, powered by Apache Kylin.
In this presentation, Senior Solutions Architect Robert Hardaway will describe a modern data service architecture using precomputation and distributed indexes to provide interactive analytics to hundreds or even thousands of users running against very large Snowflake datasets (TBs to PBs).
Building Modern Data Platform with Microsoft AzureDmitry Anoshin
This presentation will cover Cloud history and Microsoft Azure Data Analytics capabilities. Moreover, it has a real-world example of DW modernization. Finally, we will check the alternative solution on Azure using Snowflake and Matillion ETL.
AWS Partner Webcast - Reporting and Analytics in the CloudAmazon Web Services
Do you need to make sense of increasing volumes of data coming from a variety of sources like web logs, sensor data, social media monitoring and mobile apps? Jaspersoft BI reporting, analytics and dashboard tools integrate with the Amazon Redshift data warehouse service, so you can visually analyze data right from your web browser.
The payment model is ‘pay by the hour’, with no up-front hardware or software costs. And from less than $1/hour. Jaspersoft also integrates with other AWS data sources such as Amazon RDS and Amazon EMR. You’ll also hear from a Jaspersoft/Amazon Redshift customer, Kony, who will share their insights and best practices based on their experience.
What you'll learn:
• How Redshift is architected and how to leverage it
• How to use Jaspersoft reporting, analytics and dashboarding tools for Amazon Redshift, and other AWS data sources
• A customer’s perspective, from an active customer that’s done the learning for you.
Who should view:
• Solution Architects, Development Leads, Developers and other Technical IT Leaders.
ICP for Data- Enterprise platform for AI, ML and Data ScienceKaran Sachdeva
IBM Cloud Private for Data, an ultimate platform for all AI, ML and Data Science workloads. Integrated analytics platform based on Containers and micro services. Works with Kubernetes and dockers, even with Redhat openshift. Delivers the variety of business use cases in all industries- FS, Telco, Retail, Manufacturing etc
The Impact of SMACT on the Data Management StackSnapLogic
This presentation introduces the concept of the "Integrator's Dilemma" and reviews some of the challenges faced by traditional data and application integration technologies when it comes to keeping up with the new enterprise data, application and API connectivity and management requirements. We review the landscape and share examples of the steps more and more IT organizations are taking to improve business alignment through faster access to trusted data.
To learn more, visit http://www.snaplogic.com/ipaas
See how the world’s leading open source solution for query acceleration on massive datasets is revolutionizing analytics for enterprises across every industry, and how you can get started using it in your organization.
https://www.brighttalk.com/webcast/18317/413952
Apache kylin 101 - Get Sub-Second Analytics on Massive DatasetsTyler Wishnoff
Learn how the world’s leading open source solution for query acceleration on massive datasets is revolutionizing analytics for enterprises across every industry, and how you can get started with it yourself.
This presentation will provide you with everything you need to understand the basics of Apache Kylin, as well as clear steps for deploying it in your organization. Learn more here: https://kyligence.io/apache-kylin-overview/
With an explosion of data, today’s emerging needs are not being met by existing technologies, which require rich skill sets and expertise. Companies that want to lead changes in highly competitive markets must optimize their storage, speed, and spending. The key is for them to augment their data management and analytics platforms with artificial intelligence and machine learning for analysts, engineers, and other users.
Apache Kylin general introduction, including background, business needs and technical challenges, theory and architecture, features and some tech detail. Following with performance and benchmark, finally, ecosystem and roadmap.
More detail, please visit http://kylin.io or follow @ApacheKylin.
Kylin is an open source Distributed Analytics Engine from eBay Inc. that provides SQL interface and multi-dimensional analysis (OLAP) on Hadoop supporting extremely large datasets
Kylin Open Source Web Site: http://kylin.io
Software Engineering, Software Consulting, Tech Lead.
Spring Boot, Spring Cloud, Spring Core, Spring JDBC, Spring Security,
Spring Transaction, Spring MVC,
Log4j, REST/SOAP WEB-SERVICES.
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...informapgpstrackings
Keep tabs on your field staff effortlessly with Informap Technology Centre LLC. Real-time tracking, task assignment, and smart features for efficient management. Request a live demo today!
For more details, visit us : https://informapuae.com/field-staff-tracking/
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdfJay Das
With the advent of artificial intelligence or AI tools, project management processes are undergoing a transformative shift. By using tools like ChatGPT, and Bard organizations can empower their leaders and managers to plan, execute, and monitor projects more effectively.
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Globus
The U.S. Geological Survey (USGS) has made substantial investments in meeting evolving scientific, technical, and policy driven demands on storing, managing, and delivering data. As these demands continue to grow in complexity and scale, the USGS must continue to explore innovative solutions to improve its management, curation, sharing, delivering, and preservation approaches for large-scale research data. Supporting these needs, the USGS has partnered with the University of Chicago-Globus to research and develop advanced repository components and workflows leveraging its current investment in Globus. The primary outcome of this partnership includes the development of a prototype enterprise repository, driven by USGS Data Release requirements, through exploration and implementation of the entire suite of the Globus platform offerings, including Globus Flow, Globus Auth, Globus Transfer, and Globus Search. This presentation will provide insights into this research partnership, introduce the unique requirements and challenges being addressed and provide relevant project progress.
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTier1 app
Even though at surface level ‘java.lang.OutOfMemoryError’ appears as one single error; underlyingly there are 9 types of OutOfMemoryError. Each type of OutOfMemoryError has different causes, diagnosis approaches and solutions. This session equips you with the knowledge, tools, and techniques needed to troubleshoot and conquer OutOfMemoryError in all its forms, ensuring smoother, more efficient Java applications.
How to Position Your Globus Data Portal for Success Ten Good PracticesGlobus
Science gateways allow science and engineering communities to access shared data, software, computing services, and instruments. Science gateways have gained a lot of traction in the last twenty years, as evidenced by projects such as the Science Gateways Community Institute (SGCI) and the Center of Excellence on Science Gateways (SGX3) in the US, The Australian Research Data Commons (ARDC) and its platforms in Australia, and the projects around Virtual Research Environments in Europe. A few mature frameworks have evolved with their different strengths and foci and have been taken up by a larger community such as the Globus Data Portal, Hubzero, Tapis, and Galaxy. However, even when gateways are built on successful frameworks, they continue to face the challenges of ongoing maintenance costs and how to meet the ever-expanding needs of the community they serve with enhanced features. It is not uncommon that gateways with compelling use cases are nonetheless unable to get past the prototype phase and become a full production service, or if they do, they don't survive more than a couple of years. While there is no guaranteed pathway to success, it seems likely that for any gateway there is a need for a strong community and/or solid funding streams to create and sustain its success. With over twenty years of examples to draw from, this presentation goes into detail for ten factors common to successful and enduring gateways that effectively serve as best practices for any new or developing gateway.
We describe the deployment and use of Globus Compute for remote computation. This content is aimed at researchers who wish to compute on remote resources using a unified programming interface, as well as system administrators who will deploy and operate Globus Compute services on their research computing infrastructure.
Unleash Unlimited Potential with One-Time Purchase
BoxLang is more than just a language; it's a community. By choosing a Visionary License, you're not just investing in your success, you're actively contributing to the ongoing development and support of BoxLang.
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus
As part of the DOE Integrated Research Infrastructure (IRI) program, NERSC at Lawrence Berkeley National Lab and ALCF at Argonne National Lab are working closely with General Atomics on accelerating the computing requirements of the DIII-D experiment. As part of the work the team is investigating ways to speedup the time to solution for many different parts of the DIII-D workflow including how they run jobs on HPC systems. One of these routes is looking at Globus Compute as a way to replace the current method for managing tasks and we describe a brief proof of concept showing how Globus Compute could help to schedule jobs and be a tool to connect compute at different facilities.
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns
Unlocking Business Potential: Tailored Technology Solutions by Prosigns
Discover how Prosigns, a leading technology solutions provider, partners with businesses to drive innovation and success. Our presentation showcases our comprehensive range of services, including custom software development, web and mobile app development, AI & ML solutions, blockchain integration, DevOps services, and Microsoft Dynamics 365 support.
Custom Software Development: Prosigns specializes in creating bespoke software solutions that cater to your unique business needs. Our team of experts works closely with you to understand your requirements and deliver tailor-made software that enhances efficiency and drives growth.
Web and Mobile App Development: From responsive websites to intuitive mobile applications, Prosigns develops cutting-edge solutions that engage users and deliver seamless experiences across devices.
AI & ML Solutions: Harnessing the power of Artificial Intelligence and Machine Learning, Prosigns provides smart solutions that automate processes, provide valuable insights, and drive informed decision-making.
Blockchain Integration: Prosigns offers comprehensive blockchain solutions, including development, integration, and consulting services, enabling businesses to leverage blockchain technology for enhanced security, transparency, and efficiency.
DevOps Services: Prosigns' DevOps services streamline development and operations processes, ensuring faster and more reliable software delivery through automation and continuous integration.
Microsoft Dynamics 365 Support: Prosigns provides comprehensive support and maintenance services for Microsoft Dynamics 365, ensuring your system is always up-to-date, secure, and running smoothly.
Learn how our collaborative approach and dedication to excellence help businesses achieve their goals and stay ahead in today's digital landscape. From concept to deployment, Prosigns is your trusted partner for transforming ideas into reality and unlocking the full potential of your business.
Join us on a journey of innovation and growth. Let's partner for success with Prosigns.
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...Anthony Dahanne
Les Buildpacks existent depuis plus de 10 ans ! D’abord, ils étaient utilisés pour détecter et construire une application avant de la déployer sur certains PaaS. Ensuite, nous avons pu créer des images Docker (OCI) avec leur dernière génération, les Cloud Native Buildpacks (CNCF en incubation). Sont-ils une bonne alternative au Dockerfile ? Que sont les buildpacks Paketo ? Quelles communautés les soutiennent et comment ?
Venez le découvrir lors de cette session ignite
Navigating the Metaverse: A Journey into Virtual Evolution"Donna Lenk
Join us for an exploration of the Metaverse's evolution, where innovation meets imagination. Discover new dimensions of virtual events, engage with thought-provoking discussions, and witness the transformative power of digital realms."
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamtakuyayamamoto1800
In this slide, we show the simulation example and the way to compile this solver.
In this solver, the Helmholtz equation can be solved by helmholtzFoam. Also, the Helmholtz equation with uniformly dispersed bubbles can be simulated by helmholtzBubbleFoam.
top nidhi software solution freedownloadvrstrong314
This presentation emphasizes the importance of data security and legal compliance for Nidhi companies in India. It highlights how online Nidhi software solutions, like Vector Nidhi Software, offer advanced features tailored to these needs. Key aspects include encryption, access controls, and audit trails to ensure data security. The software complies with regulatory guidelines from the MCA and RBI and adheres to Nidhi Rules, 2014. With customizable, user-friendly interfaces and real-time features, these Nidhi software solutions enhance efficiency, support growth, and provide exceptional member services. The presentation concludes with contact information for further inquiries.
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Globus
The Earth System Grid Federation (ESGF) is a global network of data servers that archives and distributes the planet’s largest collection of Earth system model output for thousands of climate and environmental scientists worldwide. Many of these petabyte-scale data archives are located in proximity to large high-performance computing (HPC) or cloud computing resources, but the primary workflow for data users consists of transferring data, and applying computations on a different system. As a part of the ESGF 2.0 US project (funded by the United States Department of Energy Office of Science), we developed pre-defined data workflows, which can be run on-demand, capable of applying many data reduction and data analysis to the large ESGF data archives, transferring only the resultant analysis (ex. visualizations, smaller data files). In this talk, we will showcase a few of these workflows, highlighting how Globus Flows can be used for petabyte-scale climate analysis.
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Building Enterprise OLAP on Hadoop for FSI
1. Building Enterprise OLAP on Hadoop
for Financial Services Industry
Luke Han
luke@kyligence.io | @lukehq
Co-founder & CEO of Kyligence
Creator & VP of Apache Kylin
Microsoft Regional Director & MVP
2. About Kyligence
• Formed by creators of Apache Kylin in 2016
• Offers Enterprise and Cloud version of Apache Kylin
• Funding from Redpoint, Cisco, CBC and Shunwei
• Member of Microsoft Accelerator Shanghai 2017
• Dual HQ in Silicon Valley & Shanghai, China
Kyligence booth: #855
3. Transition to Big Data…
How about your traditional data warehouse?
How about your existing OLAP/BI application?
4. Data Warehouse/OLAP
in Financial Services Industry
o The biggest industry rely on DW/OLAP
application
o Thousands applications build on top of EDW
o Experienced analysts with decade expertise
…in data…but not in technologies
6. But
you are asked to…
o Migrate or build existing OLAP/BI app to Big Data
o Better performance…just because you have Big Data now
o Train yourself to learn MR/Spark/ML…and AI
8. Presentation
Visualization
Data
Lake
Data
Source
o MOLAP on Hadoop
o Simplified Data Modeling
o Optimized for aggregation
query
o ANSI SQL
o Native on Hadoop
o On-Prem & In the Cloud
Apache Kylin: Bring OLAP back to Big Data
OLAP
Data Mart
Hive Impala Spark SQL Drill
MapReduce …Spark
9. Kylin vs Hive: Star-Schema Benchmark
0.17 0.17 0.18
142.42
161.66
189.17
0
20
40
60
80
100
120
140
160
180
200
2 10 20
ResponseTime(seconds)
Data Volume (Scale Factor)
Apache Kylin vs. Apache Hive
(lower is better)
KAP
Apache Hive
* Based on 4 Nodes, 16 Core CPU, 96 GB Memory per node
Apache Kylin
10. Global Users
FSI
• ABC
• CCB
• CMB
• CPIC
• Citic Bank
• China
Unionpay
• HUATAI
Securities
• GUOTAI
Securities
• Lufax
Telecom
• China Mobile
• China Telecom
• Chine Unicom
• AT & T
Internet
• eBay
• Yahoo! Japan
• Baidu
• Meituan
• NetEase
• Expedia
• JD.com
• VIP.com
• 360
• Toutiao
Others
• MachineZone
• Glispa
• Inovex
• Adobe
• iFLYTEC
500+ use cases in production global
Manufacturing
• SAIC
• HUAWEI
• Lenovo
• OPPO
• XIAOMI
• VIVO
Data collected from public information and kylin community
15. TPC-DS
0
50
100
150
200
250
1 4 7 101316192225283134374043464952555861646770737679828588919497
KAP: TPC-DS
• Hive: 33 queries can’t support
or run out of time
• KAP: all 99 queries supported
• Routine query between SQL
on Hadoop and Apache Kylin
23. CPIC: China Pacific Insurance (Group) Co., LTD
• Global Fortune 500 insurance company
• Top 2 insurance company in China
• $40+ billion revenue
• 8+ million customers
• 97,000+ employees
24. Challenges
• Legacy IBM Cognos + DB2 solution can’t support Big Data scenarios
• Long waiting time (minutes ~ hours for reporting)
• Low concurrency (100,000+ employees!)
• High cost
25. 2016.12
~
2017.01
KAP POC: Performance Testing
• Query Latency
• Concurrency
KAP POC: Compatibility
• Cognos Connection
• Cognos Syntax
2017.01
~
2017.03
Development
• Fixed Reports
• Flexible Reports
2017.03
~
2017.05
Go alive
• All dataset aggregation and testing
• Fixed Reports released
2017.05
~
2017.06
Journey of Kyligence Analytics Platform
• No changes on
Hadoop side
• No additional
engineers required
• Most of work done by
analysts
26. KAP + Cognos: Deployment
Dynamic Report
JDBC
Fixed Report
ODBC
KAP Query Server
Reporting & Dashboard OLAP & Data Mart Big Data Platform
27. Benefits after Adopting Kyligence
• One-stop BI platform generates complicated reports
• Over 90% queries return within 3 seconds (including high-dimensional
queries)
• Seamless integration with IBM Cognos, no change at front-end
• 2 KAP cubes replaced 2000+ IBM Cognos cubes
• Cost reduced significantly by adopting open source technology
28. Customer Quote
“Kyligence enables us to find valuable insights faster
from every insurance policy within seconds. Kyligence’s
platform allows us to achieve more with less. Our lean
management system has improved significantly”
-- Minchen Wu, Depute GM of IT, CPIC
29. Fusion Big Data Platform
• Open: Connect to Teradata/Greenplum and IBM Cognos/Saiku…
• Flexible: Self-Services for end users
• Efficiency: Speed up PC and Mobile analytics experience
China Construction Bank (CCB):
2nd Largest Bank in the World
“Apache Kylin is last piece of puzzle to
serving data asserts management
between legacy DW and new Big Data.”
-- Zhi Zhu, Vice Senior Manager of Tech Dept, CCB
30. Enterprise OLAP on Hadoop
Speed Up Mission Critical Analytics
Booth #855
luke@kyligence.io
http://kyligence.io