Saurabh K. Gupta discusses achieving data democratization through effective data integration. He outlines key considerations for data lake architecture and data ingestion frameworks to implement a data lake that empowers data democratization. The document provides an overview of data lake styles, principles for data ingestion, and techniques for batched and streaming data integration like Apache Sqoop, Apache Flume, and change data capture.
Pervasive analytics through data & analytic centricityCloudera, Inc.
Cloudera and Teradata discuss the best-in-class solution enabling companies to put data and analytics at the center of their strategy, achieve the highest forms of agility, while reducing the costs and complexity of their current environment.
Performance Acceleration: Summaries, Recommendation, MPP and moreDenodo
Watch full webinar here: https://bit.ly/3nLHayP
Performance is critical for an organization across the board. Developers can optimize execution with Summaries, MPP, Data Movement, and more. Business users rely on the Recommendation engine to guide them to the right data. Let’s discover and learn about various performance acceleration techniques in this session.
Are you exploring the transition to becoming a cloud broker? Establishing cloud business practices and marketing is one of the most overlooked areas by enterprise IT professionals. This session explores the role marketing and the 4 Ps - Product, Price, Promotion, and Placement - play in multi-cloud and cloud brokerage. Don’t let your technical success die on the vine without exposure!aka the 4 Ps of Multi-cloud
Data Lakes are early in the Gartner hype cycle, but companies are getting value from their cloud-based data lake deployments. Break through the confusion between data lakes and data warehouses and seek out the most appropriate use cases for your big data lakes.
Pervasive analytics through data & analytic centricityCloudera, Inc.
Cloudera and Teradata discuss the best-in-class solution enabling companies to put data and analytics at the center of their strategy, achieve the highest forms of agility, while reducing the costs and complexity of their current environment.
Performance Acceleration: Summaries, Recommendation, MPP and moreDenodo
Watch full webinar here: https://bit.ly/3nLHayP
Performance is critical for an organization across the board. Developers can optimize execution with Summaries, MPP, Data Movement, and more. Business users rely on the Recommendation engine to guide them to the right data. Let’s discover and learn about various performance acceleration techniques in this session.
Are you exploring the transition to becoming a cloud broker? Establishing cloud business practices and marketing is one of the most overlooked areas by enterprise IT professionals. This session explores the role marketing and the 4 Ps - Product, Price, Promotion, and Placement - play in multi-cloud and cloud brokerage. Don’t let your technical success die on the vine without exposure!aka the 4 Ps of Multi-cloud
Data Lakes are early in the Gartner hype cycle, but companies are getting value from their cloud-based data lake deployments. Break through the confusion between data lakes and data warehouses and seek out the most appropriate use cases for your big data lakes.
Building a Logical Data Fabric using Data Virtualization (ASEAN)Denodo
Watch full webinar here: https://bit.ly/3FF1ubd
In the recent Building the Unified Data Warehouse and Data Lake report by leading industry analysts TDWI, we have discovered 64% of organizations stated the objective for a unified Data Warehouse and Data Lakes is to get more business value and 84% of organizations polled felt that a unified approach to Data Warehouses and Data Lakes was either extremely or moderately important.
In this session, you will learn how your organization can apply a logical data fabric and the associated technologies of machine learning, artificial intelligence, and data virtualization can reduce time to value. Hence, increasing the overall business value of your data assets.
KEY TAKEAWAYS:
- How a Logical Data Fabric is the right approach to assist organizations to unify their data.
- The advanced features of a Logical Data Fabric that assist with the democratization of data, providing an agile and governed approach to business analytics and data science.
- How a Logical Data Fabric with Data Virtualization enhances your legacy data integration landscape to simplify data access and encourage self-service.
Pradeep Varadan, Verizon's Wireline OSS Data Science Lead and Scott Gidley, Zaloni's VP, Product Management discuss the benefits of augmenting your DW with a data lake in this webinar presentation.
Webinar: Data Modeling and Shortcuts to Success in Scaling Time Series Applic...DATAVERSITY
Join Basho technologies and Databricks, creators of Apache Spark, as we share lessons learned by both organizations in building scalable applications for IoT and time series use cases. We'll be discussing some of the data modeling considerations unique to time series data and some of the key factors developers and architects need to take into consideration as data moves through the pipeline. You'll learn:
Challenges in building apps to leverage data being generated by IoT devices
What you need to think about before you start modeling your IoT data
Shortcuts to success in building IoT apps
The webinar will also give a live demonstration of how to store and retrieve IoT data as well as a demonstration of integrated data store with analytics engine using a live Notebook as a guide.
Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...Zaloni
When building your data stack, the architecture could be your biggest challenge. Yet it could also be the best predictor for success. With so many elements to consider and no proven playbook, where do you begin to assemble best practices for a scalable data architecture? Ben Sharma, thought leader and coauthor of Architecting Data Lakes, offers lessons learned from the field to get you started.
In this presentation at DAMA New York, Joe started by asking a key question: why are we doing this? Why analyze and share all these massive amounts of data? Basically, it comes down to the belief that in any organization, in any situation, if we can get the data and make it correct and timely, insights from it will become instantly actionable for companies to function more nimbly and successfully. Enabling the use of data can be a world-changing, world-improving activity and this session presents the steps necessary to get you there. Joe explained the concept of the "data lake" and also emphasizes the role of a strong data governance strategy that incorporates seven components needed for a successful program.
For more information on this presentation or Caserta Concepts, visit our website at http://casertaconcepts.com/.
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
So many buzzwords of late: Data Lakehouse, Data Mesh, and Data Fabric. What do all these terms mean and how do they compare to a data warehouse? In this session I’ll cover all of them in detail and compare the pros and cons of each. I’ll include use cases so you can see what approach will work best for your big data needs.
Understanding Metadata: Why it's essential to your big data solution and how ...Zaloni
In this O'Reilly webcast, Ben Sharma (cofounder and CEO of Zaloni) and Vikram Sreekanti (software engineer in the AMPLab at UC Berkeley) discuss the value of collecting and analyzing metadata, and its potential to impact your big data solution and your business.
Watch the replay here: http://oreil.ly/28LO7IW
Incorporating the Data Lake into Your Analytic ArchitectureCaserta
Joe Caserta, President at Caserta Concepts presented at the 3rd Annual Enterprise DATAVERSITY conference. The emphasis of this year's agenda is on the key strategies and architecture necessary to create a successful, modern data analytics organization.
Joe Caserta presented Incorporating the Data Lake into Your Analytics Architecture.
For more information on the services offered by Caserta Concepts, visit out website at http://casertaconcepts.com/.
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...Data Con LA
Why and How has the Big Data based Enterprise Data Lake solution based on No-SQL and SQL technologies has become significantly effective in solving enterprise data challenges than its predecessor EDW which had tried and failed to solve the same problem entirely based on SQL database only.
Data Lakes are meant to support many of the same analytics capabilities of Data Warehouses while overcoming some of the core problems. Yet Data Lakes have a distinctly different technology base. This webinar will provide an overview of the standard architecture components of Data Lakes.
This will include:
The Lab and the factory
The base environment for batch analytics
Critical governance components
Additional components necessary for real-time analytics and ingesting streaming data
Ovum Fireside Chat: Governing the data lake - Understanding what's in thereZaloni
In Ovum’s upcoming Big Data Trends to Watch 2016 report, Tony Baer forecasts that data lake management will become a front-burner issue as early Hadoop adopters get to the point of production implementation.
During this fireside chat, Tony Baer and Scott Gidley, VP of Product Management at Zaloni will assess the state of the industry regarding governance and data management tools, technologies, and practices that should fall into place as part of a data lake strategy.
Watch the webinar here: http://hubs.ly/H03374z0
Oracle OpenWorld London - session for Stream Analysis, time series analytics, streaming ETL, streaming pipelines, big data, kafka, apache spark, complex event processing
Webinar - Data Lake Management: Extending Storage and Lifecycle of DataZaloni
Join Gus Horn of NetApp and Scott Gidley of Zaloni as they discuss effective data lake lifecycle management and data architecture modernization. This webinar will address the best ways to achieve new levels of data insight and how to get superior value from your data.
Organizations have been collecting, storing, and accessing data from the beginning of computerization. Insights gained from analyzing the data enable them to identify new opportunities, improve core processes, enable continuous learning and differentiation, remain competitive, and thrive in an increasingly challenging business environment.
The well-established data architecture, consisting of a data warehouse, fed from multiple operational data stores, and fronted by BI tools, has served most organizations well. However, over the last two decades, with the explosion of internet-scale data, and the advent of new approaches to data and computational processing, this tried-and-true data architecture has come under strain, and has created both challenges and opportunities for organizations.
In this green paper, we will discuss modern approaches to data architecture that have evolved to address these challenges and provide a framework for companies to build a data architecture and better adapt to increasing demands of the modern business environment. This discussion of data architecture will be tied to the Data Maturity Journey introduced in EQengineered’s June 2021 green paper on Data Modernization.
The Shifting Landscape of Data IntegrationDATAVERSITY
Enterprises and organizations from every industry and scale are working to leverage data to achieve their strategic objectives — whether they are to be more profitable, effective, risk-tolerant, prepared, sustainable, and/or adaptable in an ever-changing world. Data has exploded in volume during the last decade as humans and machines alike produce data at an exponential pace. Also, exciting technologies have emerged around that data to improve our abilities and capabilities around what we can do with data.
Behind this data revolution, there are forces at work, causing enterprises to shift the way they leverage data and accelerate the demand for leverageable data. Organizations (and the climates in which they operate) are becoming more and more complex. They are also becoming increasingly digital and, thus, dependent on how data informs, transforms, and automates their operations and decisions. With increased digitization comes an increased need for both scale and agility at scale.
In this session, we have undertaken an ambitious goal of evaluating the current vendor landscape and assessing which platforms have made, or are in the process of making, the leap to this new generation of Data Management and integration capabilities.
ADV Slides: Building and Growing Organizational Analytics with Data LakesDATAVERSITY
Data lakes are providing immense value to organizations embracing data science.
In this webinar, William will discuss the value of having broad, detailed, and seemingly obscure data available in cloud storage for purposes of expanding Data Science in the organization.
Building a Logical Data Fabric using Data Virtualization (ASEAN)Denodo
Watch full webinar here: https://bit.ly/3FF1ubd
In the recent Building the Unified Data Warehouse and Data Lake report by leading industry analysts TDWI, we have discovered 64% of organizations stated the objective for a unified Data Warehouse and Data Lakes is to get more business value and 84% of organizations polled felt that a unified approach to Data Warehouses and Data Lakes was either extremely or moderately important.
In this session, you will learn how your organization can apply a logical data fabric and the associated technologies of machine learning, artificial intelligence, and data virtualization can reduce time to value. Hence, increasing the overall business value of your data assets.
KEY TAKEAWAYS:
- How a Logical Data Fabric is the right approach to assist organizations to unify their data.
- The advanced features of a Logical Data Fabric that assist with the democratization of data, providing an agile and governed approach to business analytics and data science.
- How a Logical Data Fabric with Data Virtualization enhances your legacy data integration landscape to simplify data access and encourage self-service.
Pradeep Varadan, Verizon's Wireline OSS Data Science Lead and Scott Gidley, Zaloni's VP, Product Management discuss the benefits of augmenting your DW with a data lake in this webinar presentation.
Webinar: Data Modeling and Shortcuts to Success in Scaling Time Series Applic...DATAVERSITY
Join Basho technologies and Databricks, creators of Apache Spark, as we share lessons learned by both organizations in building scalable applications for IoT and time series use cases. We'll be discussing some of the data modeling considerations unique to time series data and some of the key factors developers and architects need to take into consideration as data moves through the pipeline. You'll learn:
Challenges in building apps to leverage data being generated by IoT devices
What you need to think about before you start modeling your IoT data
Shortcuts to success in building IoT apps
The webinar will also give a live demonstration of how to store and retrieve IoT data as well as a demonstration of integrated data store with analytics engine using a live Notebook as a guide.
Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...Zaloni
When building your data stack, the architecture could be your biggest challenge. Yet it could also be the best predictor for success. With so many elements to consider and no proven playbook, where do you begin to assemble best practices for a scalable data architecture? Ben Sharma, thought leader and coauthor of Architecting Data Lakes, offers lessons learned from the field to get you started.
In this presentation at DAMA New York, Joe started by asking a key question: why are we doing this? Why analyze and share all these massive amounts of data? Basically, it comes down to the belief that in any organization, in any situation, if we can get the data and make it correct and timely, insights from it will become instantly actionable for companies to function more nimbly and successfully. Enabling the use of data can be a world-changing, world-improving activity and this session presents the steps necessary to get you there. Joe explained the concept of the "data lake" and also emphasizes the role of a strong data governance strategy that incorporates seven components needed for a successful program.
For more information on this presentation or Caserta Concepts, visit our website at http://casertaconcepts.com/.
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
So many buzzwords of late: Data Lakehouse, Data Mesh, and Data Fabric. What do all these terms mean and how do they compare to a data warehouse? In this session I’ll cover all of them in detail and compare the pros and cons of each. I’ll include use cases so you can see what approach will work best for your big data needs.
Understanding Metadata: Why it's essential to your big data solution and how ...Zaloni
In this O'Reilly webcast, Ben Sharma (cofounder and CEO of Zaloni) and Vikram Sreekanti (software engineer in the AMPLab at UC Berkeley) discuss the value of collecting and analyzing metadata, and its potential to impact your big data solution and your business.
Watch the replay here: http://oreil.ly/28LO7IW
Incorporating the Data Lake into Your Analytic ArchitectureCaserta
Joe Caserta, President at Caserta Concepts presented at the 3rd Annual Enterprise DATAVERSITY conference. The emphasis of this year's agenda is on the key strategies and architecture necessary to create a successful, modern data analytics organization.
Joe Caserta presented Incorporating the Data Lake into Your Analytics Architecture.
For more information on the services offered by Caserta Concepts, visit out website at http://casertaconcepts.com/.
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...Data Con LA
Why and How has the Big Data based Enterprise Data Lake solution based on No-SQL and SQL technologies has become significantly effective in solving enterprise data challenges than its predecessor EDW which had tried and failed to solve the same problem entirely based on SQL database only.
Data Lakes are meant to support many of the same analytics capabilities of Data Warehouses while overcoming some of the core problems. Yet Data Lakes have a distinctly different technology base. This webinar will provide an overview of the standard architecture components of Data Lakes.
This will include:
The Lab and the factory
The base environment for batch analytics
Critical governance components
Additional components necessary for real-time analytics and ingesting streaming data
Ovum Fireside Chat: Governing the data lake - Understanding what's in thereZaloni
In Ovum’s upcoming Big Data Trends to Watch 2016 report, Tony Baer forecasts that data lake management will become a front-burner issue as early Hadoop adopters get to the point of production implementation.
During this fireside chat, Tony Baer and Scott Gidley, VP of Product Management at Zaloni will assess the state of the industry regarding governance and data management tools, technologies, and practices that should fall into place as part of a data lake strategy.
Watch the webinar here: http://hubs.ly/H03374z0
Oracle OpenWorld London - session for Stream Analysis, time series analytics, streaming ETL, streaming pipelines, big data, kafka, apache spark, complex event processing
Webinar - Data Lake Management: Extending Storage and Lifecycle of DataZaloni
Join Gus Horn of NetApp and Scott Gidley of Zaloni as they discuss effective data lake lifecycle management and data architecture modernization. This webinar will address the best ways to achieve new levels of data insight and how to get superior value from your data.
Organizations have been collecting, storing, and accessing data from the beginning of computerization. Insights gained from analyzing the data enable them to identify new opportunities, improve core processes, enable continuous learning and differentiation, remain competitive, and thrive in an increasingly challenging business environment.
The well-established data architecture, consisting of a data warehouse, fed from multiple operational data stores, and fronted by BI tools, has served most organizations well. However, over the last two decades, with the explosion of internet-scale data, and the advent of new approaches to data and computational processing, this tried-and-true data architecture has come under strain, and has created both challenges and opportunities for organizations.
In this green paper, we will discuss modern approaches to data architecture that have evolved to address these challenges and provide a framework for companies to build a data architecture and better adapt to increasing demands of the modern business environment. This discussion of data architecture will be tied to the Data Maturity Journey introduced in EQengineered’s June 2021 green paper on Data Modernization.
The Shifting Landscape of Data IntegrationDATAVERSITY
Enterprises and organizations from every industry and scale are working to leverage data to achieve their strategic objectives — whether they are to be more profitable, effective, risk-tolerant, prepared, sustainable, and/or adaptable in an ever-changing world. Data has exploded in volume during the last decade as humans and machines alike produce data at an exponential pace. Also, exciting technologies have emerged around that data to improve our abilities and capabilities around what we can do with data.
Behind this data revolution, there are forces at work, causing enterprises to shift the way they leverage data and accelerate the demand for leverageable data. Organizations (and the climates in which they operate) are becoming more and more complex. They are also becoming increasingly digital and, thus, dependent on how data informs, transforms, and automates their operations and decisions. With increased digitization comes an increased need for both scale and agility at scale.
In this session, we have undertaken an ambitious goal of evaluating the current vendor landscape and assessing which platforms have made, or are in the process of making, the leap to this new generation of Data Management and integration capabilities.
ADV Slides: Building and Growing Organizational Analytics with Data LakesDATAVERSITY
Data lakes are providing immense value to organizations embracing data science.
In this webinar, William will discuss the value of having broad, detailed, and seemingly obscure data available in cloud storage for purposes of expanding Data Science in the organization.
Using Data Platforms That Are Fit-For-PurposeDATAVERSITY
We must grow the data capabilities of our organization to fully deal with the many and varied forms of data. This cannot be accomplished without an intense focus on the many and growing technical bases that can be used to store, view, and manage data. There are many, now more than ever, that have merit in organizations today.
This session sorts out the valuable data stores, how they work, what workloads they are good for, and how to build the data foundation for a modern competitive enterprise.
Presentation on Data Mesh: The paradigm shift is a new type of eco-system architecture, which is a shift left towards a modern distributed architecture in which it allows domain-specific data and views “data-as-a-product,” enabling each domain to handle its own data pipelines.
ADV Slides: Data Pipelines in the Enterprise and ComparisonDATAVERSITY
Despite the many, varied, and legitimate data platforms that exist today, data seldom lands once in its perfect spot for the long haul of usage. Data is continually on the move in an enterprise into new platforms, new applications, new algorithms, and new users. The need for data integration in the enterprise is at an all-time high.
Solutions that meet these criteria are often called data pipelines. These are designed to be used by business users, in addition to technology specialists, for rapid turnaround and agile needs. The field is often referred to as self-service data integration.
Although the stepwise Extraction-Transformation-Loading (ETL) remains a valid approach to integration, ELT, which uses the power of the database processes for transformation, is usually the preferred approach. The approach can often be schema-less and is frequently supported by the fast Apache Spark back-end engine, or something similar.
In this session, we look at the major data pipeline platforms. Data pipelines are well worth exploring for any enterprise data integration need, especially where your source and target are supported, and transformations are not required in the pipeline.
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureDATAVERSITY
Whether to take data ingestion cycles off the ETL tool and the data warehouse or to facilitate competitive Data Science and building algorithms in the organization, the data lake – a place for unmodeled and vast data – will be provisioned widely in 2020.
Though it doesn’t have to be complicated, the data lake has a few key design points that are critical, and it does need to follow some principles for success. Avoid building the data swamp, but not the data lake! The tool ecosystem is building up around the data lake and soon many will have a robust lake and data warehouse. We will discuss policy to keep them straight, send data to its best platform, and keep users’ confidence up in their data platforms.
Data lakes will be built in cloud object storage. We’ll discuss the options there as well.
Get this data point for your data lake journey.
When and How Data Lakes Fit into a Modern Data ArchitectureDATAVERSITY
Whether to take data ingestion cycles off the ETL tool and the data warehouse or to facilitate competitive Data Science and building algorithms in the organization, the data lake – a place for unmodeled and vast data – will be provisioned widely in 2020.
Though it doesn’t have to be complicated, the data lake has a few key design points that are critical, and it does need to follow some principles for success. Avoid building the data swamp, but not the data lake! The tool ecosystem is building up around the data lake and soon many will have a robust lake and data warehouse. We will discuss policy to keep them straight, send data to its best platform, and keep users’ confidence up in their data platforms.
Data lakes will be built in cloud object storage. We’ll discuss the options there as well.
Get this data point for your data lake journey.
It is a fascinating, explosive time for enterprise analytics.
It is from the position of analytics leadership that the mission will be executed and company leadership will emerge. The data professional is absolutely sitting on the performance of the company in this information economy and has an obligation to demonstrate the possibilities and originate the architecture, data, and projects that will deliver analytics. After all, no matter what business you’re in, you’re in the business of analytics.
The coming years will be full of big changes in enterprise analytics and Data Architecture. William will kick off the fourth year of the Advanced Analytics series with a discussion of the trends winning organizations should build into their plans, expectations, vision, and awareness now.
Data lakes are central repositories that store large volumes of structured, unstructured, and semi-structured data. They are ideal for machine learning use cases and support SQL-based access and programmatic distributed data processing frameworks. Data lakes can store data in the same format as its source systems or transform it before storing it. They support native streaming and are best suited for storing raw data without an intended use case. Data quality and governance practices are crucial to avoid a data swamp. Data lakes enable end-users to leverage insights for improved business performance and enable advanced analytics.
Unlock Your Data for ML & AI using Data VirtualizationDenodo
How Denodo Complement’s Logical Data Lake in Cloud
● Denodo does not substitute data warehouses, data lakes,
ETLs...
● Denodo enables the use of all together plus other data
sources
○ In a logical data warehouse
○ In a logical data lake
○ They are very similar, the only difference is in the main
objective
● There are also use cases where Denodo can be used as data
source in a ETL flow
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...DATAVERSITY
Thirty years is a long time for a technology foundation to be as active as relational databases. Are their replacements here?
In this webinar, we look at this foundational technology for modern Data Management and show how it evolved to meet the workloads of today, as well as when other platforms make sense for enterprise data.
Advanced Analytics and Machine Learning with Data Virtualization (India)Denodo
Watch full webinar here: https://bit.ly/3dMN503
Advanced data science techniques, like machine learning, have proven an extremely useful tool to derive valuable insights from existing data. Platforms like Spark, and complex libraries for R, Python, and Scala put advanced techniques at the fingertips of the data scientists. However, these data scientists spend most of their time looking for the right data and massaging it into a usable format. Data virtualization offers a new alternative to address these issues in a more efficient and agile way.
Watch this session to learn how companies can use data virtualization to:
- Create a logical architecture to make all enterprise data available for advanced analytics exercise
- Accelerate data acquisition and massaging, providing the data scientist with a powerful tool to complement their practice
- Integrate popular tools from the data science ecosystem: Spark, Python, Zeppelin, Jupyter, etc
The Future of Data Warehousing: ETL Will Never be the SameCloudera, Inc.
Traditional data warehouse ETL has become too slow, too complicated, and too expensive to address the torrent of new data sources and new analytic approaches needed for decision making. The new ETL environment is already looking drastically different.
In this webinar, Ralph Kimball, founder of the Kimball Group, and Manish Vipani, Vice President and Chief Architect of Enterprise Architecture at Kaiser Permanente will describe how this new ETL environment is actually implemented at Kaiser Permanente. They will describe the successes, the unsolved challenges, and their visions of the future for data warehouse ETL.
Architect’s Open-Source Guide for a Data Mesh ArchitectureDatabricks
Data Mesh is an innovative concept addressing many data challenges from an architectural, cultural, and organizational perspective. But is the world ready to implement Data Mesh?
In this session, we will review the importance of core Data Mesh principles, what they can offer, and when it is a good idea to try a Data Mesh architecture. We will discuss common challenges with implementation of Data Mesh systems and focus on the role of open-source projects for it. Projects like Apache Spark can play a key part in standardized infrastructure platform implementation of Data Mesh. We will examine the landscape of useful data engineering open-source projects to utilize in several areas of a Data Mesh system in practice, along with an architectural example. We will touch on what work (culture, tools, mindset) needs to be done to ensure Data Mesh is more accessible for engineers in the industry.
The audience will leave with a good understanding of the benefits of Data Mesh architecture, common challenges, and the role of Apache Spark and other open-source projects for its implementation in real systems.
This session is targeted for architects, decision-makers, data-engineers, and system designers.
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...Denodo
Watch full webinar here: https://bit.ly/35FUn32
Presented at CDAO New Zealand
Advanced data science techniques, like machine learning, have proven an extremely useful tool to derive valuable insights from existing data. Platforms like Spark, and complex libraries for R, Python, and Scala put advanced techniques at the fingertips of the data scientists.
However, most architecture laid out to enable data scientists miss two key challenges:
- Data scientists spend most of their time looking for the right data and massaging it into a usable format
- Results and algorithms created by data scientists often stay out of the reach of regular data analysts and business users
Watch this session on-demand to understand how data virtualization offers an alternative to address these issues and can accelerate data acquisition and massaging. And a customer story on the use of Machine Learning with data virtualization.
How a Logical Data Fabric Enhances the Customer 360 ViewDenodo
Watch full webinar here: https://bit.ly/3GI802M
Organisations have struggled for years in understanding their customers, this has mainly been due to not having the right data available at the right point in time. In this session we will discuss the role of Data Virtualization in providing customer 360 degree view and look at some of the success stories our customers have told us about.
Active Governance Across the Delta Lake with AlationDatabricks
Alation provides a single interface to provide users and stewards to provide active and agile data governance across Databricks Delta Lake and Databricks SQL Analytics Service. Understand how Alation can expand adoption in the data lake while providing safe and responsible data consumption.
Similar to Achieve data democracy in data lake with data integration (20)
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Search and Society: Reimagining Information Access for Radical FuturesBhaskar Mitra
The field of Information retrieval (IR) is currently undergoing a transformative shift, at least partly due to the emerging applications of generative AI to information access. In this talk, we will deliberate on the sociotechnical implications of generative AI for information access. We will argue that there is both a critical necessity and an exciting opportunity for the IR community to re-center our research agendas on societal needs while dismantling the artificial separation between the work on fairness, accountability, transparency, and ethics in IR and the rest of IR research. Instead of adopting a reactionary strategy of trying to mitigate potential social harms from emerging technologies, the community should aim to proactively set the research agenda for the kinds of systems we should build inspired by diverse explicitly stated sociotechnical imaginaries. The sociotechnical imaginaries that underpin the design and development of information access technologies needs to be explicitly articulated, and we need to develop theories of change in context of these diverse perspectives. Our guiding future imaginaries must be informed by other academic fields, such as democratic theory and critical theory, and should be co-developed with social science scholars, legal scholars, civil rights and social justice activists, and artists, among others.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
Connector Corner: Automate dynamic content and events by pushing a button
Achieve data democracy in data lake with data integration
1. Achieve Data Democratization with effective
Data Integration
Saurabh K. Gupta
Manager, Data & Analytics, GE
www.amazon.com/author/saurabhgupta
@saurabhkg
2. Disclaimer:
“This report has been prepared by the Authors who are part of GE. The opinions expresses herein by the Authors and the
information contained herein are in good faith and Authors and GE disclaim any liability for the content in the report. The
report is the property of GE and GE is the holder of the copyright or any intellectual property over the report. No part of this
document may be reproduced in any manner without the written permission of GE. This Report also contains certain
information available in public domain, created and maintained by private and public organizations. GE does not control or
guarantee the accuracy, relevance, timelines or completeness of such information. This Report constitutes a view as on the
date of publication and is subject to change. GE does not warrant or solicit any kind of act or omission based on this Report.”
3. AIOUG Sangam’17
✓ Data lake is relatively a new term when compared to all
fancy ones, since the industry realized the potential of
data.
✓ Enterprises are bending their backwards to build a stable
data strategy and take a leap towards data
democratization.
✓ Traditional approaches pertaining to data pipelines, data
processing, data security still hold true but architects need
to scoot an extra mile while designing a big data lake.
✓ This session will focus on data integration design
considerations. We will discuss the relevance of data
democratization in the organizational data strategy.
Abstract
✓ Data Lake architectural styles
✓ Implement data lake for democratization
✓ Data Ingestion framework principles
Learning objectives
✓ Manager, Data & Analytics at General Electric
✓ 11+ years of experience in data architecture, data
engineering, analytics
✓ Books authored:
✓ Practical Enterprise Data Lake Insights | Apress | 2018
✓ Advanced Oracle PL/SQL Developer’s Guide | Packt |
2016
✓ Oracle Advanced PL/SQL Developer Professional
Guide | Packt | 2012
✓ Speaker at AIOUG, IOUG, NASSCOM
✓ Twitter @saurabhkg
✓ Blog@ sbhoracle.wordpress.com
About Me
4. Practical Enterprise Data Lake Insights - Published
Footer
Handle Data-Driven Challenges in an Enterprise Big Data Lake
Use this practical guide to successfully handle the challenges encountered
when designing an enterprise data lake and learn industry best practices to
resolve issues.
What You'll Learn:
• Get to know data lake architecture and design principles
• Implement data capture and streaming strategies
• Implement data processing strategies in Hadoop
• Understand the data lake security framework and availability model
Apress – https://www.apress.com/in/book/9781484235218/
Amazon - https://www.amazon.com/Practical-Enterprise-Data-Lake-
Insights/dp/1484235215/
5. Data Explosion
Future data trends
AIOUG Sangam’17
Data as a Service
Cybersecurity
Augmented Analytics Machine Intelligence
“Fast Data” and
“Actionable data”
6. Evolution of Data Lake
• James Dixon’s “time machine” vision of data
• Leads Data-as-an-Asset strategy
“If you think of a datamart as a store of bottled water –
cleansed and packaged and structured for easy
consumption – the data lake is a large body of water in a
more natural state. The contents of the data lake stream
in from a source to fill the lake, and various users of the
lake can come to examine, dive in, or take samples.”
-James Dixon
Footer
9. Data Democracy
Provide the data in a matter in which it can be consumed - regardless of the people,
process or end user technology
Footer
Data Empowerment
Complement business
with data
Data Commercialization
Data backed decisions
Data As A Service
Data Vision
10. Enterprise Data Lake operational pillars
Footer
Data Lake
Data
Management
Data Ingestion Data Engineering
Data
Consumption
Data
Integration
12. Understand the data
• Structured data is an organized piece of
information
• Aligns strongly with the relational standards
• Defined metadata
• Easy ingestion, retrieval, and processing
• Unstructured data lacks structure and
metadata
• Not so easy to ingest
• Complex retrieval
• Complex processing
Footer
13. Understand the data sources
• OLTP and Data warehouses – structured data from typical relational data stores.
• Data management systems – documents and text files
• Legacy systems – essential for historical and regulatory analytics
• Sensors and IoT devices – Devices installed on healthcare, home, and mobile appliances and large
machines can upload logs to data lake at periodic intervals or in a secure network region
• Web content – data from the web world (retail sites, blogs, social media)
• Geographical data – data flowing from location data, maps, and geo-positioning systems.
Footer
14. Data Ingestion Framework
Design Considerations
• Data format – What format is the data to be ingested?
• Data change rate – Critical for CDC design and streaming data. Performance is a derivative of
throughput and latency.
• Data location and security –
• Whether data is located on-premise or public cloud infrastructure. While fetching data from
cloud instances, network bandwidth plays an important role.
• If the data source is enclosed within a security layer, ingestion framework should be enabled
establishment of a secure tunnel to collect data for ingestion
• Transfer data size (file compression and file splitting) – what would be the average and maximum
size of block or object in a single ingestion operation?
• Target file format – Data from a source system needs to be ingested in a hadoop compatible file
format.
Footer
15. ETL vs ELT for Data Lake
• Heavy transformation may restrict data
surface area for data exploration
• Brings down the data agility
• Transformation on huge volumes of data may
foster a latency between data source and
data lake
• Curated layer to empower analytical models
Footer
ETL
ELT
16. Batched data ingestion principles
Structured data
• Data collector fires a SELECT query (also known as filter query) on the source to pull incremental
records or full extract
• Query performance and source workload determine how efficient data collector is
• Robust and flexible
Footer
• Change Track flag – flag rows with the operation code
• Incremental extraction – pull all the changes after certain timestamp
• Full Extraction – refresh target on every ingestion run
Standard Ingestion techniques
17. Change Data Capture
• Log mining process
• Capture changed data from the source system’s transaction logs and integrate with the target
• Eliminating the need to run SQL queries on source system. Incurs no load overhead on a
transactional source system.
• Achieves near real-time replication between source and target
Footer
18. Change Data Capture
Design Considerations
• Source database be enabled for logging
• Commercial tools - Oracle GoldenGate, HVR, Talend CDC, custom replicators
• Keys are extremely important for replication
• Helps capture job in establishing uniqueness of a record in the changed data set
• Source PK ensures the changes are applied to the correct record on target
• PK not available; establish uniqueness based on composite columns
• Establish uniqueness based on a unique constraint - terrible design!!
• Trigger based CDC
• Event on a table triggers the change to be captured in an change-log table
• Change-log table merged with the target
• Works when source transaction logs are not available
Footer
19. LinkedIn Databus
CDC capture pipeline
• Relay is responsible for pulling the most recent
committed transactions from the source
• Relays are implemented through tungsten replicator
• Relay stores the changes in logs or cache in
compressed format
• Consumer pulls the changes from relay
• Bootstrap component – a snapshot of data source on
a temporary instance. It is consistent with the changes
captured by Relay
• If any consumer falls behind and can’t find the changes
in relay, bootstrap component transforms and packages
the changes to the consumer
• A new consumer, with the help of client library, can
apply all the changes from bootstrap component until a
time. Client library will point the consumer to Relay to
continue pulling most recent changes
Footer
Relay LogWriter Log Storage
LogApplier Snapshot
Storage
Consolidated changes
Consistent Snapshot
20. Change merge techniques
Design Considerations
Footer
Exchange
Partition
Prepare final dataset
with merged changes
P1
P2
P3
P4
#Changes
P4
1
Change capture
2 Pull most recent partition for Compare andMerge
3 4
Data source
Hive Table with
partitions
Flow
• Table partitioned on time
dimension
• Changes are captured
incrementally
• Changes tagged by table name and
the most recent partition
• Exchange partition process - recent
partition compared against the
“change” data set for merging
21. Apache Sqoop
• Native member of Hadoop tech stack for data ingestion
• Batched ingestion, no CDC
• Java based utility (web interface in Sqoop2) that spawns Map jobs from MapReduce engine to
store data in HDFS
• Provides full extract as well as incremental import mode support
• Runs on HDFS cluster and can populate tables in Hive, HBase
• Can establish a data integration layer between NoSQL and HDFS
• Can be integrated with Oozie to schedule import/export tasks
• Supports connectors to multiple relational databases like Oracle, SQL Server, MySQL
Footer
22. Sqoop architecture
• Mapper jobs of MapReduce processing
layer in Hadoop
• By default, a sqoop job has four
mappers
• Rule of Split
• Values of --split-by column must be
equally distributed to each mapper
• --split-by column must be a primary key
• --split-by column should be a primary
key
Footer
23. Sqoop - FYI
Design considerations - I
• Mappers
• --num-mappers [n] argument
• run in parallel within Hadoop. No formula but needs to be judiciously set
• Cannot split
• --autoreset-to-one-mapper to perform unsplit extraction
• Source has no PK
• Split based on natural or surrogate key
• Source has character keys
• Divide and conquer! Manual partitions and run one mapper per partition
• If key value is an integer, no worries
Footer
24. Sqoop - FYI
Design considerations - II
• If only subset of columns is required from the source table, specify column list in --columns
argument.
• For example, --columns “orderId, product, sales”
• If limited rows are required to be “sqooped”, specify --where clause with the predicate clause.
• For example, --where “sales > 1000”
• If result of a structured query needs to be imported, use --query clause.
• For example, --query ‘select orderId, product, sales from orders where sales>1000’
• Use --hive-partition-key and --hive-partition-value attributes to create partitions on a column key
from the import
• Delimiters can be handled through either of the below ways –
• Specify --hive-drop-import-delims to remove delimiters during import process
• Specify --hive-delims-replacement to replace delimiters with an alternate character
Footer
25. Oracle copyToBDA
• Licensed under Oracle BigData SQL
• Stack of Oracle BDA, Exadata, Infiniband
• Helps in loading Oracle database tables to Hadoop by –
• Dumping the table data in Data Pump format
• Copying them into HDFS
• Full extract and load
• Source data changes
• Rerun the utility to refresh Hive tables
Footer
26. Greenplum’s GPHDFS
• Setup on all segment nodes of a
Greenplum cluster
• All segments concurrently push
the local copies of data splits to
Hadoop cluster
• Cluster segments yield the power
of parallelism
Footer
Writeable
Ext Table
Writeable
Ext Table
Writeable
Ext Table
Writeable
Ext Table
Writeable
Ext Table
Hadoop clusterGreenplum cluster
27. Stream unstructured data using Flume
• Distributed system to capture and load large volumes of log data from different source systems to
data lake
• Collection and aggregation of streaming data as events
Footer
Incoming
Events
Outgoing
Data
Source Sink
Flume Agent
Channel
Source Transaction Sink Transaction
Client
PUT TAKE
28. Apache Flume - FYI
Design considerations - I
• Channel type
• MEMORY - events are read from source to memory
• Good performance, but volatile. Not cost effective
• FILE – events are ready from source into file system
• Controllable performance. Persistent and Transactional guarantee
• JDBC - events are read and stored in Derby database
• Slow performance
• Kafka – store events in Kafka topic
• Event batch size - maximum number of events that can be batched by source or sink in a single
transaction
• Fatter the better. But not for FILE channel
• Stable number ensures data consistency
Footer
29. Apache Flume - FYI
Design considerations - II
• Channel capacity and transaction capacity
• For MEMORY channel, channel capacity is limited
by RAM size.
• For FILE, channel capacity is limited by disk size
• Should not exceed batch size configured for the
sinks
• Channel selector
• An event can either be replicated or multiplexed
• Preferable vs conditional
• Handle high throughput systems
• Tiered architecture to handle event flow
• Aggregate and push approach
Footer