Caserta Concepts and Databricks partner up to bring you this insightful webinar on how a business can choose from all of the emerging big data technologies to figure out which one best fits their needs.
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing KeynoteCaserta
The “Big Data era” has ushered in an avalanche of new technologies and approaches for delivering information and insights to business users. What is the role of the cloud in your analytical environment? How can you make your migration as seamless as possible? This closing keynote, delivered by Joe Caserta, a prominent consultant who has helped many global enterprises adopt Big Data, provided the audience with the inside scoop needed to supplement data warehousing environments with data intelligence—the amalgamation of Big Data and business intelligence.
This presentation was given as the closing keynote at DBTA's annual Data Summit in NYC.
Using Machine Learning & Spark to Power Data-Driven MarketingCaserta
Joe Caserta provides a statistically-driven model to understanding the customer path to purchase, which combines online, offline and third-party data sources. He shows how customer data is fed to machine learning, which assigns weighted credit to customer interactions in order to give insight to what marketing activities truly matter. This presentation is from Caserta's February 2018 Big Data Warehousing Meetup co-hosted with Databricks.
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016Caserta
Caserta Concepts Founder and President, Joe Caserta, gave this presentation at Strata + Hadoop World 2016 in New York, NY. His session covers path-to-purchase analytics using a data lake and spark.
For more information, visit http://casertaconcepts.com/
Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...Caserta
The role of the Chief Data Officer (CDO) has become integral to the evolution needed to turn a wisdom-driven company into an analytics-driven company. With Data Governance at the core of your responsibility, moving the innovation meter is a global challenge among CDOs. Specifically the CDO must:
• Provide a single point of accountability for data initiatives and issues
• Innovate ways to use existing data and evangelize a data vision for the organization
• Support & enforce data governance policies via outreach, training & tools
• Work with IT to develop/maintain an enterprise data repository
• Set standards for analytical reporting and generate data insights through data science
In this session, Joe Caserta addresses real-word CDO challenges, shares techniques to overcome them, manage corporate disruption and achieve success.
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...Caserta
Joe Caserta explores the world of analytics, tech, and AI to paint a picture of where business is headed. This presentation is from the CDAO Exchange in Miami 2018.
There is an overwhelming list of expectations – and challenges – in this new, emerging and evolving role. In this presentation, given at the 2016 CDO Summit, Joe Caserta focuses on:
- Defining the CDO title
- Outlining the skills that enhance chances for success
- Listing all the many things the company thinks you are responsible for
- Providing an overview of the core technologies you need to be familiar with and will serve to ultimately support your success
- Presenting a concise list of the most pressing challenges
- Sharing insights and arguments for how best to meet the challenges and succeed in your new role
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017Caserta
Over the past eight or nine years, applying DevOps practices to various areas of technology within business has grown in popularity and produced demonstrable results. These principles are particularly fruitful when applied to a data analytics environment. Bob Eilbacher explains how to implement a strong DevOps practice for data analysis, starting with the necessary cultural changes that must be made at the executive level and ending with an overview of potential DevOps toolchains. Bob also outlines why DevOps and disruption management go hand in hand.
Topics include:
- The benefits of a DevOps approach, with an emphasis on improving quality and efficiency of data analytics
- Why the push for a DevOps practice needs to come from the C-suite and how it can be integrated into all levels of business
- An overview of the best tools for developers, data analysts, and everyone in between, based on the business’s existing data ecosystem
- The challenges that come with transforming into an analytics-driven company and how to overcome them
- Practical use cases from Caserta clients
This presentation was originally given by Bob at the 2017 Strata Data Conference in New York City.
The 20th annual Enterprise Data World (EDW) Conference took place in San Diego last month April 17-21. It is recognized as the most comprehensive educational conference on data management in the world.
Joe Caserta was a featured presenter. His session “Evolving from the Data Warehouse to Big Data Analytics - the Emerging Role of the Data Lake," highlighted the challenges and steps to needed to becoming a data-driven organization.
Joe also participated in in two panel discussions during the show:
• "Data Lake or Data Warehouse?"
• "Big Data Investments Have Been Made, But What's Next
For more information on Caserta Concepts, visit our website at http://casertaconcepts.com/.
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing KeynoteCaserta
The “Big Data era” has ushered in an avalanche of new technologies and approaches for delivering information and insights to business users. What is the role of the cloud in your analytical environment? How can you make your migration as seamless as possible? This closing keynote, delivered by Joe Caserta, a prominent consultant who has helped many global enterprises adopt Big Data, provided the audience with the inside scoop needed to supplement data warehousing environments with data intelligence—the amalgamation of Big Data and business intelligence.
This presentation was given as the closing keynote at DBTA's annual Data Summit in NYC.
Using Machine Learning & Spark to Power Data-Driven MarketingCaserta
Joe Caserta provides a statistically-driven model to understanding the customer path to purchase, which combines online, offline and third-party data sources. He shows how customer data is fed to machine learning, which assigns weighted credit to customer interactions in order to give insight to what marketing activities truly matter. This presentation is from Caserta's February 2018 Big Data Warehousing Meetup co-hosted with Databricks.
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016Caserta
Caserta Concepts Founder and President, Joe Caserta, gave this presentation at Strata + Hadoop World 2016 in New York, NY. His session covers path-to-purchase analytics using a data lake and spark.
For more information, visit http://casertaconcepts.com/
Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...Caserta
The role of the Chief Data Officer (CDO) has become integral to the evolution needed to turn a wisdom-driven company into an analytics-driven company. With Data Governance at the core of your responsibility, moving the innovation meter is a global challenge among CDOs. Specifically the CDO must:
• Provide a single point of accountability for data initiatives and issues
• Innovate ways to use existing data and evangelize a data vision for the organization
• Support & enforce data governance policies via outreach, training & tools
• Work with IT to develop/maintain an enterprise data repository
• Set standards for analytical reporting and generate data insights through data science
In this session, Joe Caserta addresses real-word CDO challenges, shares techniques to overcome them, manage corporate disruption and achieve success.
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...Caserta
Joe Caserta explores the world of analytics, tech, and AI to paint a picture of where business is headed. This presentation is from the CDAO Exchange in Miami 2018.
There is an overwhelming list of expectations – and challenges – in this new, emerging and evolving role. In this presentation, given at the 2016 CDO Summit, Joe Caserta focuses on:
- Defining the CDO title
- Outlining the skills that enhance chances for success
- Listing all the many things the company thinks you are responsible for
- Providing an overview of the core technologies you need to be familiar with and will serve to ultimately support your success
- Presenting a concise list of the most pressing challenges
- Sharing insights and arguments for how best to meet the challenges and succeed in your new role
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017Caserta
Over the past eight or nine years, applying DevOps practices to various areas of technology within business has grown in popularity and produced demonstrable results. These principles are particularly fruitful when applied to a data analytics environment. Bob Eilbacher explains how to implement a strong DevOps practice for data analysis, starting with the necessary cultural changes that must be made at the executive level and ending with an overview of potential DevOps toolchains. Bob also outlines why DevOps and disruption management go hand in hand.
Topics include:
- The benefits of a DevOps approach, with an emphasis on improving quality and efficiency of data analytics
- Why the push for a DevOps practice needs to come from the C-suite and how it can be integrated into all levels of business
- An overview of the best tools for developers, data analysts, and everyone in between, based on the business’s existing data ecosystem
- The challenges that come with transforming into an analytics-driven company and how to overcome them
- Practical use cases from Caserta clients
This presentation was originally given by Bob at the 2017 Strata Data Conference in New York City.
The 20th annual Enterprise Data World (EDW) Conference took place in San Diego last month April 17-21. It is recognized as the most comprehensive educational conference on data management in the world.
Joe Caserta was a featured presenter. His session “Evolving from the Data Warehouse to Big Data Analytics - the Emerging Role of the Data Lake," highlighted the challenges and steps to needed to becoming a data-driven organization.
Joe also participated in in two panel discussions during the show:
• "Data Lake or Data Warehouse?"
• "Big Data Investments Have Been Made, But What's Next
For more information on Caserta Concepts, visit our website at http://casertaconcepts.com/.
Joe Caserta was a featured speaker, along with MIT Sloan School faculty and other industry thought-leaders. His session 'You're the New CDO, Now What?' discussed how new CDOs can accomplish their strategic objectives and overcome tactical challenges in this emerging executive leadership role.
In its tenth year, the MIT CDOIQ Symposium 2016 continues to explore the developing role of the Chief Data Officer.
For more information, visit http://casertaconcepts.com/
General Data Protection Regulation - BDW Meetup, October 11th, 2017Caserta
Caserta Presentation:
General Data Protection Regulation (GDPR) is a business and technical challenge for companies worldwide - and the deadlines are coming fast! American institutions that do business in the EU or have customers from the EU will have their data practices affected. With this in mind, Caserta – joined by Waterline Data, Salt Recruiting, and Squire Patton Boggs – hosted a BDW Meetup on the GDPR, which is perhaps the most controversial data legislation that has been passed to date.
Joe Caserta, Founding President, Caserta, spoke on the basics of the GDPR, how it will impact data privacy around the world, and some techniques geared towards compliance.
A modern, flexible approach to Hadoop implementation incorporating innovation...DataWorks Summit
A modern, flexible approach to Hadoop implementation incorporating innovations from HP Haven
Jeff Veis
Vice President
HP Software Big Data
Gilles Noisette
Master Solution Architect
HP EMEA Big Data CoE
Joe Caserta's 2016 Data Summit Workshop "Introduction to Data Science with Hadoop" on May 9, expanded on his Intro to Data Science Workshop held at last year's Summit. Again, Joe presented to a standing-room only audience with a focus on the data lake, governance and the role of the data scientist.
For more information on Caserta Concepts, visit our website: http://casertaconcepts.com/
During this Big Data Warehousing Meetup, Caserta Concepts and Databricks addressed the number one operational and analytic goal of nearly every organization today – to have complete view of every customer. Customer Data Integration (CDI) must be implemented to cleanse and match customer identities within and across various data systems. CDI has been a long-standing data engineering challenge, not just one of logic and complexity but also of performance and scalability.
The speakers brought together best practice techniques with Apache Spark to achieve complete CDI.
Speakers:
Joe Caserta, President, Caserta Concepts
Kevin Rasmussen, Big Data Engineer, Caserta Concepts
Vida Ha, Lead Solutions Engineer, Databricks
The sessions covered a series of problems that are adequately solved with Apache Spark, as well as those that are require additional technologies to implement correctly. Topics included:
· Building an end-to-end CDI pipeline in Apache Spark
· What works, what doesn’t, and how do we use Spark we evolve
· Innovation with Spark including methods for customer matching from statistical patterns, geolocation, and behavior
· Using Pyspark and Python’s rich module ecosystem for data cleansing and standardization matching
· Using GraphX for matching and scalable clustering
· Analyzing large data files with Spark
· Using Spark for ETL on large datasets
· Applying Machine Learning & Data Science to large datasets
· Connecting BI/Visualization tools to Apache Spark to analyze large datasets internally
The speakers also touched on data governance, on-boarding new data rapidly, how to balance rapid agility and time to market with critical decision support and customer interaction. They also shared examples of problems that Apache Spark is not optimized for.
For more information on the services offered by Caserta Concepts, visit our website: http://casertaconcepts.com/
Moving Past Infrastructure Limitations Presented by MediaMath
This presentation was given at a Big Data Warehousing Meetup with Caserta Concepts, MediaMath and Qubole. You can learn more about the event here: http://www.meetup.com/Big-Data-Warehousing/events/228372516/
Event description:
At Caserta Concepts, we are firm believers in big data thriving on the cloud. The instant-on, nearly unlimited storage and computing capabilities of AWS has made it the defacto solution for a full spectrum of organizations needing to process large amounts of data.
What's more, an ecosystem of value-added platforms has emerged to further ease and democratize the implementation of cloud based solutions. Qubole has developed a great platform for easily deploying and managing ephemeral and long-lived Hadoop and Spark clusters on AWS.
Moving Past Infrastructure Limitations: Data Warehousing at MediaMath
Over the past year and a half, MediaMath has undertaken a “data liberation” effort in an attempt to leave their bigbox, monolithic data warehouse behind. In this talk, Rory Sawyer, Software Engineer at MediaMath, will describe how this effort transformed MediaMath’s legacy architecture and legacy mindset, which imposed harsh inefficiencies on data sharing and utilization. The current mindset removes these inefficiencies and allows them to say “yes” to more projects and ideas.
Rory will also demo how MediaMath uses Amazon Web Services and Qubole so that infrastructure is no longer a limiting factor on what and how users query. This combination allows them to scale their resources up and down as needed while bridging different data sources and execution engines. Using and extending MediaMath’s data warehousing is no longer a privileged activity but an ability that every employee and client has.
Caserta Concepts, Datameer and Microsoft shared their combined knowledge and a use case on big data, the cloud and deep analytics. Attendes learned how a global leader in the test, measurement and control systems market reduced their big data implementations from 18 months to just a few.
Speakers shared how to provide a business user-friendly, self-service environment for data discovery and analytics, and focus on how to extend and optimize Hadoop based analytics, highlighting the advantages and practical applications of deploying on the cloud for enhanced performance, scalability and lower TCO.
Agenda included:
- Pizza and Networking
- Joe Caserta, President, Caserta Concepts - Why are we here?
- Nikhil Kumar, Sr. Solutions Engineer, Datameer - Solution use cases and technical demonstration
- Stefan Groschupf, CEO & Chairman, Datameer - The evolving Hadoop-based analytics trends and the role of cloud computing
- James Serra, Data Platform Solution Architect, Microsoft, Benefits of the Azure Cloud Service
- Q&A, Networking
For more information on Caserta Concepts, visit our website: http://casertaconcepts.com/
DataOps: Nine steps to transform your data science impact Strata London May 18Harvinder Atwal
According to Forrester Research, only 22% of companies are currently seeing a significant return from data science expenditures. Most data science implementations are high-cost IT projects, local applications that are not built to scale for production workflows, or laptop decision support projects that never impact customers. Despite this high failure rate, we keep hearing the same mantra and solutions over and over again. Everybody talks about how to create models, but not many people talk about getting them into production where they can impact customers.
Harvinder Atwal offers an entertaining and practical introduction to DataOps, a new and independent approach to delivering data science value at scale, used at companies like Facebook, Uber, LinkedIn, Twitter, and eBay. The key to adding value through DataOps is to adapt and borrow principles from Agile, Lean, and DevOps. However, DataOps is not just about shipping working machine learning models; it starts with better alignment of data science with the rest of the organization and its goals. Harvinder shares experience-based solutions for increasing your velocity of value creation, including Agile prioritization and collaboration, new operational processes for an end-to-end data lifecycle, developer principles for data scientists, cloud solution architectures to reduce data friction, self-service tools giving data scientists freedom from bottlenecks, and more. The DataOps methodology will enable you to eliminate daily barriers, putting your data scientists in control of delivering ever-faster cutting-edge innovation for your organization and customers.
Joe Caserta, President at Caserta Concepts presented at the 3rd Annual Enterprise DATAVERSITY conference. The emphasis of this year's agenda is on the key strategies and architecture necessary to create a successful, modern data analytics organization.
Joe Caserta presented What Data Do You Have and Where is it?
For more information on the services offered by Caserta Concepts, visit out website at http://casertaconcepts.com/.
Caserta Concepts, Datameer and Microsoft shared their combined knowledge and a use case on big data, the cloud and deep analytics. Attendes learned how a global leader in the test, measurement and control systems market reduced their big data implementations from 18 months to just a few.
Speakers shared how to provide a business user-friendly, self-service environment for data discovery and analytics, and focus on how to extend and optimize Hadoop based analytics, highlighting the advantages and practical applications of deploying on the cloud for enhanced performance, scalability and lower TCO.
Agenda included:
- Pizza and Networking
- Joe Caserta, President, Caserta Concepts - Why are we here?
- Nikhil Kumar, Sr. Solutions Engineer, Datameer - Solution use cases and technical demonstration
- Stefan Groschupf, CEO & Chairman, Datameer - The evolving Hadoop-based analytics trends and the role of cloud computing
- James Serra, Data Platform Solution Architect, Microsoft, Benefits of the Azure Cloud Service
- Q&A, Networking
For more information on Caserta Concepts, visit our website: http://casertaconcepts.com/
Maximize the Value of Your Data: Neo4j Graph Data PlatformNeo4j
In this 60-minute conversation with IDC, we will highlight the momentum and reasons why a graph data platform is a breakthrough solution for businesses in need of a flexible data model.
Please join Mohit Sagar, Group Managing Director of CIO Network, as he hosts the conversation with Dr. Christopher Lee Marshall, Associate VP at IDC, and Nik Vora, Vice President of APAC at Neo4. During this very exciting discussion, you'll discover the insights and knowledge unlocked with the graph data platform.
Joe Caserta, President at Caserta Concepts, presented "Setting Up the Data Lake" at a DAMA Philadelphia Chapter Meeting.
For more information on the services offered by Caserta Concepts, visit our website at http://casertaconcepts.com/.
DataOps - Big Data and AI World London - March 2020 - Harvinder AtwalHarvinder Atwal
Title
DataOps, the secret weapon for delivering AI, data science, and business intelligence value at speed.
Synopsis
● According to recent research, just 7.3% of organisations say the state of their data and analytics is excellent, and only 22% of companies are currently seeing a significant return from data science expenditure.
● Poor returns on data & analytics investment are often the result of applying 20th-century thinking to 21st-century challenges and opportunities.
● Modern data science and analytics require secure, efficient processes to turn raw data from multiple sources and in numerous formats into useful inputs to a data product.
● Developing, orchestrating and iterating modern data pipelines is an extremely complex process requiring multiple technologies and skills.
● Other domains have to successfully overcome the challenge of delivering high-quality products at speed in complex environments. DataOps applies proven agile principles, lean thinking and DevOps practices to the development of data products.
● A DataOps approach aligns data producers, analytical data consumers, processes and technology with the rest of the organisation and its goals.
Smart Data Webinar: Choosing the Right Data Management Architecture for Cogni...DATAVERSITY
Once developers have a knowledge management model - covered in our August webinar - they still have to deal with real world implementation constraints. Big data is a fact of life for most modern AI/cognitive computing apps, which usually means ingesting, sampling, or analyzing large data sets from disparate sources, ranging from IOT sensors to social media streams to news feeds and weather forecasts. Frequently, historical data in legacy systems will also be required to generate new insights.
This webinar will present a framework to help participants evaluate streaming data management tools, IOT technology stacks, and graph databases as support tools for their modern AI/cognitive computing projects. They will also learn about emerging open source projects and ecosystems that can help kick start their projects today.
Using Machine Learning to Understand and Predict Marketing ROIDATAVERSITY
Marketing is all about attracting, retaining and building profitable relationships with your customers, but how do you know which customers to target, which campaigns to run, and which marketing programs to invest in, to get most return for your dollar?
Join Alteryx and Keyrus as we demonstrate how to combine all relevant marketing, sales and customer data, and perform sophisticated analytics to deepen customer insight and calculate ROI of marketing programs.
You’ll walk away knowing how to:
Segment and profile your customers – take that raw data and translate it into real value
Build a marketing attribution model within Alteryx, creating a personal answer engine for your company.
Leverage R or Python code in an Alteryx workflow so data scientists can collaborate with non-coding stake holders in a code-friendly and code-free environment.
Join Alteryx and Keyrus and get the actionable insights you need to drive marketing ROI analytics, and answer million-dollar questions without spending millions of dollars on standardized solutions.
Smart Data Webinar: Knowledge as a ServiceDATAVERSITY
Building a successful ModernAI application often requires large volumes of data for training ML models or data that has been organized into knowledge using taxonomies or ontologies to support specific vertical markets (healthcare, insurance, pharma, etc.) or horizontal functions (HR, legal, supply chain, etc). While tools do exist to help developers ingest and organize the required data into meaningful knowledge stores, using pre-built data or knowledge packages can make application development faster, more reliable, and less expensive than starting from scratch.
In this webinar we will look at trends and examples of specific proprietary and open source data sets that offer prebuilt knowledge, representations, or models to serve these markets.
IBM Governed data lake is a value-driven big data platform journey. The journey starts by ingesting wide variety of data, governing it, applying data science and machine learning on it to produce actionable insights.
Against the backdrop of Big Data, the Chief Data Officer, by any name, is emerging as the central player in the business of data, including cybersecurity. The MITCDOIQ Symposium explored the developing landscape, from local organizational issues to global challenges, through case studies from industry, academic, government and healthcare leaders.
Joe Caserta, president at Caserta Concepts, presented "Big Data's Impact on the Enterprise" at the MITCDOIQ Symposium.
Presentation Abstract: Organizations are challenged with managing an unprecedented volume of structured and unstructured data coming into the enterprise from a variety of verified and unverified sources. With that is the urgency to rapidly maximize value while also maintaining high data quality.
Today we start with some history and the components of data governance and information quality necessary for successful solutions. I then bring it all to life with 2 client success stories, one in healthcare and the other in banking and financial services. These case histories illustrate how accurate, complete, consistent and reliable data results in a competitive advantage and enhanced end-user and customer satisfaction.
To learn more, visit www.casertaconcepts.com
The Data Lake - Balancing Data Governance and Innovation Caserta
Joe Caserta gave the presentation "The Data Lake - Balancing Data Governance and Innovation" at DAMA NY's one day mini-conference on May 19th. Speakers covered emerging trends in Data Governance, especially around Big Data.
For more information on Caserta Concepts, visit our website at http://casertaconcepts.com/.
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!Caserta
Joe Caserta went over the details inside the big data ecosystem and the Caserta Concepts Data Pyramid, which includes Data Ingestion, Data Lake/Data Science Workbench and the Big Data Warehouse. He then dove into the foundation of dimensional data modeling, which is as important as ever in the top tier of the Data Pyramid. Topics covered:
- The 3 grains of Fact Tables
- Modeling the different types of Slowly Changing Dimensions
- Advanced Modeling techniques like Ragged Hierarchies, Bridge Tables, etc.
- ETL Architecture.
He also talked about ModelStorming, a technique used to quickly convert business requirements into an Event Matrix and Dimensional Data Model.
This was a jam-packed abbreviated version of 4 days of rigorous training of these techniques being taught in September by Joe Caserta (Co-Author, with Ralph Kimball, The Data Warehouse ETL Toolkit) and Lawrence Corr (Author, Agile Data Warehouse Design).
For more information, visit http://casertaconcepts.com/.
Joe Caserta was a featured speaker, along with MIT Sloan School faculty and other industry thought-leaders. His session 'You're the New CDO, Now What?' discussed how new CDOs can accomplish their strategic objectives and overcome tactical challenges in this emerging executive leadership role.
In its tenth year, the MIT CDOIQ Symposium 2016 continues to explore the developing role of the Chief Data Officer.
For more information, visit http://casertaconcepts.com/
General Data Protection Regulation - BDW Meetup, October 11th, 2017Caserta
Caserta Presentation:
General Data Protection Regulation (GDPR) is a business and technical challenge for companies worldwide - and the deadlines are coming fast! American institutions that do business in the EU or have customers from the EU will have their data practices affected. With this in mind, Caserta – joined by Waterline Data, Salt Recruiting, and Squire Patton Boggs – hosted a BDW Meetup on the GDPR, which is perhaps the most controversial data legislation that has been passed to date.
Joe Caserta, Founding President, Caserta, spoke on the basics of the GDPR, how it will impact data privacy around the world, and some techniques geared towards compliance.
A modern, flexible approach to Hadoop implementation incorporating innovation...DataWorks Summit
A modern, flexible approach to Hadoop implementation incorporating innovations from HP Haven
Jeff Veis
Vice President
HP Software Big Data
Gilles Noisette
Master Solution Architect
HP EMEA Big Data CoE
Joe Caserta's 2016 Data Summit Workshop "Introduction to Data Science with Hadoop" on May 9, expanded on his Intro to Data Science Workshop held at last year's Summit. Again, Joe presented to a standing-room only audience with a focus on the data lake, governance and the role of the data scientist.
For more information on Caserta Concepts, visit our website: http://casertaconcepts.com/
During this Big Data Warehousing Meetup, Caserta Concepts and Databricks addressed the number one operational and analytic goal of nearly every organization today – to have complete view of every customer. Customer Data Integration (CDI) must be implemented to cleanse and match customer identities within and across various data systems. CDI has been a long-standing data engineering challenge, not just one of logic and complexity but also of performance and scalability.
The speakers brought together best practice techniques with Apache Spark to achieve complete CDI.
Speakers:
Joe Caserta, President, Caserta Concepts
Kevin Rasmussen, Big Data Engineer, Caserta Concepts
Vida Ha, Lead Solutions Engineer, Databricks
The sessions covered a series of problems that are adequately solved with Apache Spark, as well as those that are require additional technologies to implement correctly. Topics included:
· Building an end-to-end CDI pipeline in Apache Spark
· What works, what doesn’t, and how do we use Spark we evolve
· Innovation with Spark including methods for customer matching from statistical patterns, geolocation, and behavior
· Using Pyspark and Python’s rich module ecosystem for data cleansing and standardization matching
· Using GraphX for matching and scalable clustering
· Analyzing large data files with Spark
· Using Spark for ETL on large datasets
· Applying Machine Learning & Data Science to large datasets
· Connecting BI/Visualization tools to Apache Spark to analyze large datasets internally
The speakers also touched on data governance, on-boarding new data rapidly, how to balance rapid agility and time to market with critical decision support and customer interaction. They also shared examples of problems that Apache Spark is not optimized for.
For more information on the services offered by Caserta Concepts, visit our website: http://casertaconcepts.com/
Moving Past Infrastructure Limitations Presented by MediaMath
This presentation was given at a Big Data Warehousing Meetup with Caserta Concepts, MediaMath and Qubole. You can learn more about the event here: http://www.meetup.com/Big-Data-Warehousing/events/228372516/
Event description:
At Caserta Concepts, we are firm believers in big data thriving on the cloud. The instant-on, nearly unlimited storage and computing capabilities of AWS has made it the defacto solution for a full spectrum of organizations needing to process large amounts of data.
What's more, an ecosystem of value-added platforms has emerged to further ease and democratize the implementation of cloud based solutions. Qubole has developed a great platform for easily deploying and managing ephemeral and long-lived Hadoop and Spark clusters on AWS.
Moving Past Infrastructure Limitations: Data Warehousing at MediaMath
Over the past year and a half, MediaMath has undertaken a “data liberation” effort in an attempt to leave their bigbox, monolithic data warehouse behind. In this talk, Rory Sawyer, Software Engineer at MediaMath, will describe how this effort transformed MediaMath’s legacy architecture and legacy mindset, which imposed harsh inefficiencies on data sharing and utilization. The current mindset removes these inefficiencies and allows them to say “yes” to more projects and ideas.
Rory will also demo how MediaMath uses Amazon Web Services and Qubole so that infrastructure is no longer a limiting factor on what and how users query. This combination allows them to scale their resources up and down as needed while bridging different data sources and execution engines. Using and extending MediaMath’s data warehousing is no longer a privileged activity but an ability that every employee and client has.
Caserta Concepts, Datameer and Microsoft shared their combined knowledge and a use case on big data, the cloud and deep analytics. Attendes learned how a global leader in the test, measurement and control systems market reduced their big data implementations from 18 months to just a few.
Speakers shared how to provide a business user-friendly, self-service environment for data discovery and analytics, and focus on how to extend and optimize Hadoop based analytics, highlighting the advantages and practical applications of deploying on the cloud for enhanced performance, scalability and lower TCO.
Agenda included:
- Pizza and Networking
- Joe Caserta, President, Caserta Concepts - Why are we here?
- Nikhil Kumar, Sr. Solutions Engineer, Datameer - Solution use cases and technical demonstration
- Stefan Groschupf, CEO & Chairman, Datameer - The evolving Hadoop-based analytics trends and the role of cloud computing
- James Serra, Data Platform Solution Architect, Microsoft, Benefits of the Azure Cloud Service
- Q&A, Networking
For more information on Caserta Concepts, visit our website: http://casertaconcepts.com/
DataOps: Nine steps to transform your data science impact Strata London May 18Harvinder Atwal
According to Forrester Research, only 22% of companies are currently seeing a significant return from data science expenditures. Most data science implementations are high-cost IT projects, local applications that are not built to scale for production workflows, or laptop decision support projects that never impact customers. Despite this high failure rate, we keep hearing the same mantra and solutions over and over again. Everybody talks about how to create models, but not many people talk about getting them into production where they can impact customers.
Harvinder Atwal offers an entertaining and practical introduction to DataOps, a new and independent approach to delivering data science value at scale, used at companies like Facebook, Uber, LinkedIn, Twitter, and eBay. The key to adding value through DataOps is to adapt and borrow principles from Agile, Lean, and DevOps. However, DataOps is not just about shipping working machine learning models; it starts with better alignment of data science with the rest of the organization and its goals. Harvinder shares experience-based solutions for increasing your velocity of value creation, including Agile prioritization and collaboration, new operational processes for an end-to-end data lifecycle, developer principles for data scientists, cloud solution architectures to reduce data friction, self-service tools giving data scientists freedom from bottlenecks, and more. The DataOps methodology will enable you to eliminate daily barriers, putting your data scientists in control of delivering ever-faster cutting-edge innovation for your organization and customers.
Joe Caserta, President at Caserta Concepts presented at the 3rd Annual Enterprise DATAVERSITY conference. The emphasis of this year's agenda is on the key strategies and architecture necessary to create a successful, modern data analytics organization.
Joe Caserta presented What Data Do You Have and Where is it?
For more information on the services offered by Caserta Concepts, visit out website at http://casertaconcepts.com/.
Caserta Concepts, Datameer and Microsoft shared their combined knowledge and a use case on big data, the cloud and deep analytics. Attendes learned how a global leader in the test, measurement and control systems market reduced their big data implementations from 18 months to just a few.
Speakers shared how to provide a business user-friendly, self-service environment for data discovery and analytics, and focus on how to extend and optimize Hadoop based analytics, highlighting the advantages and practical applications of deploying on the cloud for enhanced performance, scalability and lower TCO.
Agenda included:
- Pizza and Networking
- Joe Caserta, President, Caserta Concepts - Why are we here?
- Nikhil Kumar, Sr. Solutions Engineer, Datameer - Solution use cases and technical demonstration
- Stefan Groschupf, CEO & Chairman, Datameer - The evolving Hadoop-based analytics trends and the role of cloud computing
- James Serra, Data Platform Solution Architect, Microsoft, Benefits of the Azure Cloud Service
- Q&A, Networking
For more information on Caserta Concepts, visit our website: http://casertaconcepts.com/
Maximize the Value of Your Data: Neo4j Graph Data PlatformNeo4j
In this 60-minute conversation with IDC, we will highlight the momentum and reasons why a graph data platform is a breakthrough solution for businesses in need of a flexible data model.
Please join Mohit Sagar, Group Managing Director of CIO Network, as he hosts the conversation with Dr. Christopher Lee Marshall, Associate VP at IDC, and Nik Vora, Vice President of APAC at Neo4. During this very exciting discussion, you'll discover the insights and knowledge unlocked with the graph data platform.
Joe Caserta, President at Caserta Concepts, presented "Setting Up the Data Lake" at a DAMA Philadelphia Chapter Meeting.
For more information on the services offered by Caserta Concepts, visit our website at http://casertaconcepts.com/.
DataOps - Big Data and AI World London - March 2020 - Harvinder AtwalHarvinder Atwal
Title
DataOps, the secret weapon for delivering AI, data science, and business intelligence value at speed.
Synopsis
● According to recent research, just 7.3% of organisations say the state of their data and analytics is excellent, and only 22% of companies are currently seeing a significant return from data science expenditure.
● Poor returns on data & analytics investment are often the result of applying 20th-century thinking to 21st-century challenges and opportunities.
● Modern data science and analytics require secure, efficient processes to turn raw data from multiple sources and in numerous formats into useful inputs to a data product.
● Developing, orchestrating and iterating modern data pipelines is an extremely complex process requiring multiple technologies and skills.
● Other domains have to successfully overcome the challenge of delivering high-quality products at speed in complex environments. DataOps applies proven agile principles, lean thinking and DevOps practices to the development of data products.
● A DataOps approach aligns data producers, analytical data consumers, processes and technology with the rest of the organisation and its goals.
Smart Data Webinar: Choosing the Right Data Management Architecture for Cogni...DATAVERSITY
Once developers have a knowledge management model - covered in our August webinar - they still have to deal with real world implementation constraints. Big data is a fact of life for most modern AI/cognitive computing apps, which usually means ingesting, sampling, or analyzing large data sets from disparate sources, ranging from IOT sensors to social media streams to news feeds and weather forecasts. Frequently, historical data in legacy systems will also be required to generate new insights.
This webinar will present a framework to help participants evaluate streaming data management tools, IOT technology stacks, and graph databases as support tools for their modern AI/cognitive computing projects. They will also learn about emerging open source projects and ecosystems that can help kick start their projects today.
Using Machine Learning to Understand and Predict Marketing ROIDATAVERSITY
Marketing is all about attracting, retaining and building profitable relationships with your customers, but how do you know which customers to target, which campaigns to run, and which marketing programs to invest in, to get most return for your dollar?
Join Alteryx and Keyrus as we demonstrate how to combine all relevant marketing, sales and customer data, and perform sophisticated analytics to deepen customer insight and calculate ROI of marketing programs.
You’ll walk away knowing how to:
Segment and profile your customers – take that raw data and translate it into real value
Build a marketing attribution model within Alteryx, creating a personal answer engine for your company.
Leverage R or Python code in an Alteryx workflow so data scientists can collaborate with non-coding stake holders in a code-friendly and code-free environment.
Join Alteryx and Keyrus and get the actionable insights you need to drive marketing ROI analytics, and answer million-dollar questions without spending millions of dollars on standardized solutions.
Smart Data Webinar: Knowledge as a ServiceDATAVERSITY
Building a successful ModernAI application often requires large volumes of data for training ML models or data that has been organized into knowledge using taxonomies or ontologies to support specific vertical markets (healthcare, insurance, pharma, etc.) or horizontal functions (HR, legal, supply chain, etc). While tools do exist to help developers ingest and organize the required data into meaningful knowledge stores, using pre-built data or knowledge packages can make application development faster, more reliable, and less expensive than starting from scratch.
In this webinar we will look at trends and examples of specific proprietary and open source data sets that offer prebuilt knowledge, representations, or models to serve these markets.
IBM Governed data lake is a value-driven big data platform journey. The journey starts by ingesting wide variety of data, governing it, applying data science and machine learning on it to produce actionable insights.
Against the backdrop of Big Data, the Chief Data Officer, by any name, is emerging as the central player in the business of data, including cybersecurity. The MITCDOIQ Symposium explored the developing landscape, from local organizational issues to global challenges, through case studies from industry, academic, government and healthcare leaders.
Joe Caserta, president at Caserta Concepts, presented "Big Data's Impact on the Enterprise" at the MITCDOIQ Symposium.
Presentation Abstract: Organizations are challenged with managing an unprecedented volume of structured and unstructured data coming into the enterprise from a variety of verified and unverified sources. With that is the urgency to rapidly maximize value while also maintaining high data quality.
Today we start with some history and the components of data governance and information quality necessary for successful solutions. I then bring it all to life with 2 client success stories, one in healthcare and the other in banking and financial services. These case histories illustrate how accurate, complete, consistent and reliable data results in a competitive advantage and enhanced end-user and customer satisfaction.
To learn more, visit www.casertaconcepts.com
The Data Lake - Balancing Data Governance and Innovation Caserta
Joe Caserta gave the presentation "The Data Lake - Balancing Data Governance and Innovation" at DAMA NY's one day mini-conference on May 19th. Speakers covered emerging trends in Data Governance, especially around Big Data.
For more information on Caserta Concepts, visit our website at http://casertaconcepts.com/.
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!Caserta
Joe Caserta went over the details inside the big data ecosystem and the Caserta Concepts Data Pyramid, which includes Data Ingestion, Data Lake/Data Science Workbench and the Big Data Warehouse. He then dove into the foundation of dimensional data modeling, which is as important as ever in the top tier of the Data Pyramid. Topics covered:
- The 3 grains of Fact Tables
- Modeling the different types of Slowly Changing Dimensions
- Advanced Modeling techniques like Ragged Hierarchies, Bridge Tables, etc.
- ETL Architecture.
He also talked about ModelStorming, a technique used to quickly convert business requirements into an Event Matrix and Dimensional Data Model.
This was a jam-packed abbreviated version of 4 days of rigorous training of these techniques being taught in September by Joe Caserta (Co-Author, with Ralph Kimball, The Data Warehouse ETL Toolkit) and Lawrence Corr (Author, Agile Data Warehouse Design).
For more information, visit http://casertaconcepts.com/.
Caserta Concepts, Datameer and Microsoft shared their combined knowledge and a use case on big data, the cloud and deep analytics. Attendes learned how a global leader in the test, measurement and control systems market reduced their big data implementations from 18 months to just a few.
Speakers shared how to provide a business user-friendly, self-service environment for data discovery and analytics, and focus on how to extend and optimize Hadoop based analytics, highlighting the advantages and practical applications of deploying on the cloud for enhanced performance, scalability and lower TCO.
Agenda included:
- Pizza and Networking
- Joe Caserta, President, Caserta Concepts - Why are we here?
- Nikhil Kumar, Sr. Solutions Engineer, Datameer - Solution use cases and technical demonstration
- Stefan Groschupf, CEO & Chairman, Datameer - The evolving Hadoop-based analytics trends and the role of cloud computing
- James Serra, Data Platform Solution Architect, Microsoft, Benefits of the Azure Cloud Service
- Q&A, Networking
For more information on Caserta Concepts, visit our website: http://casertaconcepts.com/
During this Big Data Warehousing Meetup, Caserta Concepts and Databricks addressed the number one operational and analytic goal of nearly every organization today – to have complete view of every customer. Customer Data Integration (CDI) must be implemented to cleanse and match customer identities within and across various data systems. CDI has been a long-standing data engineering challenge, not just one of logic and complexity but also of performance and scalability.
The speakers brought together best practice techniques with Apache Spark to achieve complete CDI.
Speakers:
Joe Caserta, President, Caserta Concepts
Kevin Rasmussen, Big Data Engineer, Caserta Concepts
Vida Ha, Lead Solutions Engineer, Databricks
The sessions covered a series of problems that are adequately solved with Apache Spark, as well as those that are require additional technologies to implement correctly. Topics included:
· Building an end-to-end CDI pipeline in Apache Spark
· What works, what doesn’t, and how do we use Spark we evolve
· Innovation with Spark including methods for customer matching from statistical patterns, geolocation, and behavior
· Using Pyspark and Python’s rich module ecosystem for data cleansing and standardization matching
· Using GraphX for matching and scalable clustering
· Analyzing large data files with Spark
· Using Spark for ETL on large datasets
· Applying Machine Learning & Data Science to large datasets
· Connecting BI/Visualization tools to Apache Spark to analyze large datasets internally
The speakers also touched on data governance, on-boarding new data rapidly, how to balance rapid agility and time to market with critical decision support and customer interaction. They also shared examples of problems that Apache Spark is not optimized for.
For more information on the services offered by Caserta Concepts, visit our website: http://casertaconcepts.com/
Consumers will increasingly expect retailers to offer highly customized buying recommendations at the right time through the right device.
Being able to follow these through with seamless and secure e-commerce transactions.
The potential of Data blending in every area from automotive telemetry to medical science to national security is enormous.
Independent of the source of data, the integration of event streams into an Enterprise Architecture gets more and more important in the world of sensors, social media streams and Internet of Things. Events have to be accepted quickly and reliably, they have to be distributed and analyzed, often with many consumers or systems interested in all or part of the events. Dependent on the size and quantity of such events, this can quickly be in the range of Big Data. How can we efficiently collect and transmit these events? How can we make sure that we can always report over historical events? How can these new events be integrated into traditional infrastructure and application landscape?
Starting with a product and technology neutral reference architecture, we will then present different solutions using Open Source frameworks and the Oracle Stack both for on premises as well as the cloud.
“Thank Bunny” is a business tool that empowers retailers to execute their customer engagement activities effortlessly and auto-magically. Simply said, it helps a business to retain customers and increase sales with proven game-mechanics.
The Cloud topic is everywhere, not only for big software companies, but also for our customers and of course for all service providers.
How to move from the traditional IT to a full Cloud environment and how to manage the transition phase?
We show you the Trivadis Cloud transition approach, standardized and proven, which leads you into a safe and optimized usage of cloud services in your daily business.
It’s all about Data - a Trivadis core competence for decades - no matter which deployment model we choose.
In this presentation we shed light on various Cloud strategies and concrete technologically aspects.
Organizations hugely rely on their customer sentiments to make business related decisions. Net Promoter Score is a reliable metric to understand customer satisfaction.This infographic highlights some points to remember to make the most out of the process.
Discusses the algebraic properties of types, different kinds of functions and the information that is preserved or lost, and Category Theory concepts that underpin and unify them.
Understanding, Planning and Achieving
Data Quality in Your Organization
by Joe Caserta, President of Caserta Concepts
For more information, visit www.casertaconcepts.com or contact us at info@casertaconcepts.com
Architecting for change: LinkedIn's new data ecosystemYael Garten
2016 StrataHadoop NYC conference talk.
http://conferences.oreilly.com/strata/hadoop-big-data-ny/public/schedule/detail/52182
Abstract:
Last year, LinkedIn embarked on an ambitious mission to completely revamp the mobile experience for its members. This would mean a completely new mobile application, reimagined user experiences, and new interaction concepts. As the team evaluated the impact of this big rewrite on the data analytics ecosystem, they observed a few problems.
Over the past few years, LinkedIn has become extremely good at incrementally changing the site one mini-feature at a time, often in conjunction with hundreds of other incremental changes. LinkedIn’s experimentation platform ensures that it is always monitoring a wide gamut of impacted metrics with every change before rolling fully forward. However, when it comes to rolling out a big change like this, different challenges crop up. You have to rollout the entire application all at once; the new experience means that you have no baseline on new metrics; and existing metrics may see double digit changes just because of the new experience or because the metric’s logic is no longer accurate—the challenge is in figuring out which is which.
Shirshanka Das and Yael Garten describe how LinkedIn redesigned its data analytics ecosystem in the face of a significant product rewrite, covering the infrastructure changes that enable LinkedIn to roll out future product innovations with minimal downstream impact. Shirshanka and Yael explore the motivations and the building blocks for this reimagined data analytics ecosystem, the technical details of LinkedIn’s new client-side tracking infrastructure, its unified reporting platform, and its data virtualization layer on top of Hadoop and share lessons learned from data producers and consumers that are participating in this governance model. Along the way, they offer some anecdotal evidence during the rollout that validated some of their decisions and are also shaping the future roadmap of these efforts.
Architecting for Big Data: Trends, Tips, and Deployment OptionsCaserta
Joe Caserta, President at Caserta Concepts addressed the challenges of Business Intelligence in the Big Data world at the Third Annual Great Lakes BI Summit in Detroit, MI on Thursday, March 26. His talk "Architecting for Big Data: Trends, Tips and Deployment Options," focused on how to supplement your data warehousing and business intelligence environments with big data technologies.
For more information on this presentation or the services offered by Caserta Concepts, visit our website: http://casertaconcepts.com/.
Joe Caserta presents his vision of the future of Big Data in the Enterprise.
At the recent Harrisburg University Analytics Summit II, Joe Caserta gave this engaging presentation to Summit attendees including fellow academics, strategists, data scientists and analysts.
Building the Artificially Intelligent EnterpriseDatabricks
This session looks at where we are today with data and analytics and what is needed to transition to the Artificially Intelligent Enterprise.
How do you mobilise developers to exploit what data scientists and business analysts have built? How do you align it all with business strategy to maximise business outcomes? How do you combine BI, predictive and prescriptive analytics, automation and reinforcement learning to get maximum value across the enterprise? What is the blueprint for building the artificially intelligent enterprise?
•Data and analytics – Where are we?
•Why is the journey only half-way done?
•2021 and beyond – The new era of AI usage and not just build
•The requirement – event-driven, on-demand and automated analytics
•Operationalising what you build – DataOps, MLOps and RPA
•Mobilising the masses to integrate AI into processes – what needs to be done?
•Business strategy alignment – the guiding light to AI utilisation for high reward
•Agility step change – the shift to no-code integration of AI by citizen developers
•Recording decisions, and analysing business impact
•Reinforcement-learning – transitioning to continuous reward
Incorporating the Data Lake into Your Analytic ArchitectureCaserta
Joe Caserta, President at Caserta Concepts presented at the 3rd Annual Enterprise DATAVERSITY conference. The emphasis of this year's agenda is on the key strategies and architecture necessary to create a successful, modern data analytics organization.
Joe Caserta presented Incorporating the Data Lake into Your Analytics Architecture.
For more information on the services offered by Caserta Concepts, visit out website at http://casertaconcepts.com/.
In this presentation at DAMA New York, Joe started by asking a key question: why are we doing this? Why analyze and share all these massive amounts of data? Basically, it comes down to the belief that in any organization, in any situation, if we can get the data and make it correct and timely, insights from it will become instantly actionable for companies to function more nimbly and successfully. Enabling the use of data can be a world-changing, world-improving activity and this session presents the steps necessary to get you there. Joe explained the concept of the "data lake" and also emphasizes the role of a strong data governance strategy that incorporates seven components needed for a successful program.
For more information on this presentation or Caserta Concepts, visit our website at http://casertaconcepts.com/.
Introduction to Data Science (Data Summit, 2017)Caserta
At DBTA's 2017 Data Summit in New York, NY, Caserta Founder & President, Joe Caserta, and Senior Architect, Bill Walrond, gave a pre-conference workshop presenting the ins and outs of data science. Data scientist has been dubbed the "sexiest" job of the 21st century, but it requires an understanding of many different elements of data analysis. This presentation dives into the fundamentals of data exploration, mining, and preparation, applying the principles of statistical modeling and data visualization in real-world applications.
Workshop with Joe Caserta, President of Caserta Concepts, at Data Summit 2015 in NYC.
Data science, the ability to sift through massive amounts of data to discover hidden patterns and predict future trends and actions, may be considered the "sexiest" job of the 21st century, but it requires an understanding of many elements of data analytics. This workshop introduced basic concepts, such as SQL and NoSQL, MapReduce, Hadoop, data mining, machine learning, and data visualization.
For notes and exercises from this workshop, click here: https://github.com/Caserta-Concepts/ds-workshop.
For more information, visit our website at www.casertaconcepts.com
Slides from a recent Big Data Warehousing Meetup titled, Big Data Analytics with Microsoft.
See Power Pivot/ Power Query/ Power View/ Power Maps and Azure Machine Learning be used to analyze Big Data.
One challenge of dealing with Big Data project is to acquire both structured and instructed information in order to find the right correlation. During the event, we explained all the steps to build your model and enhance your existing data through Microsoft's Power BI.
We had an in-depth discussion about the innovations built into the latest stack of Microsoft Business Intelligence, and practical tips from Technology Specialist’s from Microsoft.
The session also featured demos to help you see the technology as an end-to-end solution.
For more information, visit www.casertaconcepts.com
Why an AI-Powered Data Catalog Tool is Critical to Business SuccessInformatica
Imagine a fast, more efficient business thriving on trusted data-driven decisions. An intelligent data catalog can help your organization discover, organize, and inventory all data assets across the org and democratize data with the right balance of governance and flexibility. Informatica's data catalog tools are powered by AI and can automate tedious data management tasks and offer immediate recommendations based on derived business intelligence. We offer data catalog workshops globally. Visit Informatica.com to attend one near you.
DataLakes kan skalere i takt med skyen, nedbryde integrationsbarrierer og data gemt i siloer og bane vejen for nye forretningsmuligheder. Det er alt sammen med til at give et bedre beslutningsgrundlag for ledelse og medarbejdere. Kom og hør hvordan.
David Bojsen, Arkitekt, Microsoft
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...DATAVERSITY
Many data scientists are well grounded in creating accomplishment in the enterprise, but many come from outside – from academia, from PhD programs and research. They have the necessary technical skills, but it doesn’t count until their product gets to production and in use. The speaker recently helped a struggling data scientist understand his organization and how to create success in it. That turned into this presentation, because many new data scientists struggle with the complexities of an enterprise.
Bridging the Gap: Analyzing Data in and Below the CloudInside Analysis
The Briefing Room with Dean Abbott and Tableau Software
Live Webcast July 23, 2013
http://www.insideanalysis.com
Today’s desire for analytics extends well beyond the traditional domain of Business Intelligence. That’s partly because business users are realizing the value of mixing and matching all kinds of data, from all kinds of sources. One emerging market driver is Cloud-based data, and the desire companies have to analyze this data cohesively with their on-premise data sets.
Register for this episode of The Briefing Room to learn from Analyst Dean Abbott, who will explain how the ability to access data in the cloud can play a critical role for generating business value from analytics. He’ll be briefed by Ellie Fields of Tableau Software who will tout Tableau’s latest release, which includes native connectors to cloud-based applications like Salesforce.com, Amazon Redshift, Google Analytics and BigQuery. She’ll also demonstrate how Tableau can combine cloud data with other data sources, including spreadsheets, databases, cubes and even Big Data.
Accelerate Self-Service Analytics with Data Virtualization and VisualizationDenodo
Watch full webinar here: https://bit.ly/3fpitC3
Enterprise organizations are shifting to self-service analytics as business users need real-time access to holistic and consistent views of data regardless of its location, source or type for arriving at critical decisions.
Data Virtualization and Data Visualization work together through a universal semantic layer. Learn how they enable self-service data discovery and improve performance of your reports and dashboards.
In this session, you will learn:
- Challenges faced by business users
- How data virtualization enables self-service analytics
- Use case and lessons from customer success
- Overview of the highlight features in Tableau
Graph databases provide the ability to quickly discover and integrate key relationships between enterprise data sets. Business use cases such as recommendation engines, social networks, enterprise knowledge graphs, and more provide valuable ways to leverage graph databases in your organization. This webinar will provide an overview of graph database technologies, and how they can be used for practical applications to drive business value.
A webinar on how Neo4j customers like Nasa, AirBnB, eBay, government agencies, investigative journalists and others are building Knowledge Graphs to inform today and tomorrow’s solutions.
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
As Europe's leading economic powerhouse and the fourth-largest hashtag#economy globally, Germany stands at the forefront of innovation and industrial might. Renowned for its precision engineering and high-tech sectors, Germany's economic structure is heavily supported by a robust service industry, accounting for approximately 68% of its GDP. This economic clout and strategic geopolitical stance position Germany as a focal point in the global cyber threat landscape.
In the face of escalating global tensions, particularly those emanating from geopolitical disputes with nations like hashtag#Russia and hashtag#China, hashtag#Germany has witnessed a significant uptick in targeted cyber operations. Our analysis indicates a marked increase in hashtag#cyberattack sophistication aimed at critical infrastructure and key industrial sectors. These attacks range from ransomware campaigns to hashtag#AdvancedPersistentThreats (hashtag#APTs), threatening national security and business integrity.
🔑 Key findings include:
🔍 Increased frequency and complexity of cyber threats.
🔍 Escalation of state-sponsored and criminally motivated cyber operations.
🔍 Active dark web exchanges of malicious tools and tactics.
Our comprehensive report delves into these challenges, using a blend of open-source and proprietary data collection techniques. By monitoring activity on critical networks and analyzing attack patterns, our team provides a detailed overview of the threats facing German entities.
This report aims to equip stakeholders across public and private sectors with the knowledge to enhance their defensive strategies, reduce exposure to cyber risks, and reinforce Germany's resilience against cyber threats.
Explore our comprehensive data analysis project presentation on predicting product ad campaign performance. Learn how data-driven insights can optimize your marketing strategies and enhance campaign effectiveness. Perfect for professionals and students looking to understand the power of data analysis in advertising. for more details visit: https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/
2. @joe_Caserta
Joe Caserta Timeline
Launched Big Data practice
Co-author, with Ralph Kimball, The Data
Warehouse ETL Toolkit (Wiley)
Data Analysis, Data Warehousing and Business
Intelligence since 1996
Began consulting database programing and data
modeling 25+ years hands-on experience building database
solutions
Founded Caserta Concepts in NYC
Web log analytics solution published in Intelligent
Enterprise magazine
Launched Data Science, Data Interaction and Cloud
practices
Laser focus on extending Data Analytics with Big Data
solutions
1986
2004
1996
2009
2001
2013
2012
2014
Dedicated to Data Governance Techniques on Big
Data (Innovation)
Awarded Top 20 Big Data Companies 2016
Top 20 Most Powerful
Big Data consulting firms
Launched Big Data Warehousing (BDW) Meetup NYC:
2,000+ Members
2016 Awarded Fastest Growing Big Data Companies
2016
Established best practices for big data ecosystem
implementations
3. @joe_Caserta
About Caserta Concepts
– Consulting Data Innovation and Modern Data Engineering
– Award-winning company
– Internationally recognized work force
– Strategy, Architecture, Implementation, Governance
– Innovation Partner
– Strategic Consulting
– Advanced Architecture
– Build & Deploy
• Leader in Enterprise Data Solutions
– Big Data Analytics
– Data Warehousing
– Business Intelligence
Data Science
Cloud Computing
Data Governance
7. @joe_Caserta
Use Case Objectives
• Cross-Channel Behavior Tracking / Identity Resolution
• Access to Atomic Level Transaction Data
• Blend Data Assets from Multiple Marketing Channels
• Fast Data Onboarding and Discovery
• Ensure Data Quality
• Single platform / Landscape simplification
• Understand Path-to-Purchase Customer Journey
• Improve Customer Experience & Increase Sales
8. @joe_Caserta
The Recipe: Build a Dynamic Data Platform
OLD WAY:
• Structure Data Ingest Data Analyze Data
• Fixed Capacity
• Monolith
NEW WAY:
• Ingest Data Analyze Data Structure Data
• Dynamic Capacity
• Ecosystem
RECIPE:
• Cloud
• Data Lake
• Holistic Architecture & Framework
9. @joe_Caserta
Analytics: The Whole Brain Challenge
Front
Back
Analytics Oriented
• Data Science
• Research
Process Oriented
• Data Governance
• Compliance
Operations Oriented
• Shared Services
• Data Engineering
Revenue Oriented
• Revenue Goals
• Monetizing Data
10. @joe_Caserta
Chief Data Organization (Oversight)
Vertical Business Area
[Sales/Finance/Marketing/Operations/Customer Svc]
Product Owner
SCRUM Master
Development Team
Business Subject Matter Expertise
Data Librarian/Data Stewardship
Data Science/ Statistical Skills
Data Engineering / Architecture
Presentation/ BI Report Development Skills
Data Quality Assurance
DevOps
IT Organization
(Oversight)
Enterprise Data Architect
Solution Engineers
Data Integration Practice
User Experience Practice
QA Practice
Operations Practice
Advanced Analytics
Business Analysts
Data Analysts
Data Scientists
Statisticians
Data Engineers
Planning Organization
Project Managers
Data Organization
Data Gov Coordinator
Data Librarians
Data Stewards
It Takes a Village!
12. @joe_Caserta
Global economics
Intensity of competition
Reduce costs
Move to cross-functional teams
New executive leadership
Speed of technical change
Social trends and changes
Period of time in present role
Status & perks of office/dept under threat
No apparent reasons for proposed changes
Lack of understanding of proposed changes
Fear of inability to cope with new technology
Concern over job security
Forces for Change Forces Resisting Change
Status Quo
Moving the Status Quo
http://www.change-management-coach.com/force-field-analysis.html
13. @joe_Caserta
The Data Lake Paradigm
Technology:
• Scalable distributed storage S3
• Pluggable fit-for-purpose processing EMR
• Consistent extensible framework Spark
• Dimensional Data Warehouse Redshift
Functional Capabilities:
• Remove barriers between data ingestion and analysis
• Democratize Data with Just Enough Data Governance
15. @joe_Caserta
Why Spark?
“Big Box” tools vs ROI?
– Prohibitively expensive limited by licensing $$$
– Typically limited to the scalability of a single server
We Spark!
• Development local or distributed is identical
• Beautiful high level API’s
• Full universe of Python modules
• Open source and Free
• Blazing fast!
• Databricks cloud makes it easier
Spark has become our default processing engine for a
myriad of engineering & science problems
16. @joe_Caserta
Why we Databricks
• Interactive UI
• Includes a workspace with notebooks, dashboards, job scheduler,
point-and-click cluster management
• Cluster sharing
• Multiple users can connect to the same cluster, saving cost
• Security features
• Access controls to the whole workspace, clusters
• Collaboration
• Multi-user access to the same notebook, revision control, and IDE
and GitHub integration
• Data management
• Support for connecting different data sources to Spark, caching
service to speed up queries
17. @joe_Caserta
Ingest Raw
Data
Organize, Define,
Complete
Munging, Blending
Machine Learning
Data Quality and Monitoring
Metadata, ILM , Security
Data Catalog
Data Integration
Fully Governed ( trusted)
Arbitrary/Ad-hoc Queries
and Reporting
Big
Data
Ware
house
Data Science Workspace
Data Lake – Integrated Sandbox
Landing Area – Source Data in “Full Fidelity”
Usage Pattern Data Governance
Metadata, ILM,
Security
Corporate Data Pyramid (CDP)
18. @joe_Caserta
Data Integration
Identity Resolution
Data Quality
Discovery / Exploration
Machine Learning
Models Development
Reports / Dashboards
Applications
APIs
Structured Data
Unstructured Data
SQL, NoSQL, Object Store
Find Share Collaborate
Data Engineer
Data Scientist
Business Analyst
App Developer
Analyze
Persist
DeployIngest
Data Lake on the Cloud
20. @joe_Caserta
Type
Comments
Single Touch Rules-Based Statistically Driven
Assign the credit
to the first or last
exposure
Assign the credit to
each interaction
based on business
rules
Assign the credit to
interactions based
on data-driven
model
Ad-Click Mailing MailingE-mail E-mailAd-Click Ad-Click
100% 33% 33% 33% 27% 49% 24%
- Last touch only
- Ignores bulk of
customer journey
- Undervalues
other interactions
and influencers
- Subjective
- Assigns arbitrary
values to each
interaction
- Lacks analytics rigor
to determine weights
ü Looks at full behavior
patterns
ü Consider all touch points
ü Can apply different
models for best results
ü Use data to find
correlations between
touch points (winning
combinations)
Path-to-Purchase Methods
21. @joe_Caserta
Unifying the Customer Across Channels
Customer Data Integration (CDI):
Match and manage customer information from all available sources
Marketing channels: DMP, Salesforce, Adobe, Social, Direct Mail, Call Center, CRM
In other words…
We need to figure out how to
LINK people across systems!
23. @joe_Caserta
Standardization and Matching
Cleanse and Parse:
• Names
• Resolve nicknames
• Create deterministic hash, phonetic
representation
• Addresses
• Emails
• Phone Numbers
Matching:
Join based on combinations of cleansed
and standardized data to create match
results:
Spark map operations:
• Data cleansing, transformation, and
standardization
– Address Parsing: usaddress, postal-
address, etc
– Name Hashing: fuzzy, etc
– Genderization: sexmachine, etc
24. @joe_Caserta
Mastering Unmanageable Source Data
Reveal
• Wait for the customer to “reveal” themselves
• Create link between anonymous self and known profile
Vector
• May need behavioral statistical profiling
• Compare use vectors
Rebuild
• Recluster all prior activities
• Rebuild the Graph
25. @joe_Caserta
The Matching Process
The matching process output gives us the relationships between customers:
Great, but it’s not very useable, you need to traverse the dataset to find out 1234 and 1235 are the
same person (and this is a trivial case)
And we need to cluster and identify our survivors (vertex)
xid yid match_type
1234 4849 phone
4849 5499 email
5499 1235 address
4849 7788 cookie
5499 7788 cookie
4849 1234 phone
26. @joe_Caserta
Graph to the Rescue
1234 4849
5499
7788
We just need to import our edges into a graph
and “dump” out communities
Don’t think table…
think Graph! These matches are
actually communities
1235
27. @joe_Caserta
Connected Components algorithm labels each connected component of the
graph with the ID of its lowest-numbered vertex
This lowest number vertex can serve as our “survivor” (not field survivorship)
Connected Components
xid yid
1234 4849
1234 5499
1234 1235
1234 7788
1234 7788
1234 1234
34. @joe_Caserta
• Data Lake: S3 / NoSQL or SQL as Needed
• Identity Resolution: Spark / Graph Frames
• ETL: Spark/Python
• Path-to-Purchase: S3 / Spark / MLlib
• Data Warehouse: RedShift
• Shared Interface: Notebooks
• Business Intelligence: Tableau
Recap
35. @joe_Caserta
Joe Caserta
President, Caserta Concepts
joe@casertaconcepts.com
@joe_Caserta
• Award-winning company
• Transformative Data Strategies
• Modern Data Engineering
• Advanced Architecture
• Innovation Partner
• Strategic Consulting
• Advanced Technical Design
• Build & Deploy Solutions
• BDW Meetup
• New York City
• 3,000+ members
• Knowledge sharing
Data is not important, it’s what you do with it that’s important!
Thank You