Billions of Rows, Millions of Insights, Right NowRob Winters
Presentation from Tableau Customer Conference 2013 on building a real time reporting/analytics platform. Topics discussed include definitions of big data and real time, technology choices and rationale, use cases for real time big data, architecture, and pitfalls to avoid.
Towards Real-Time banking API's: Introducing Coral, a web api for realtime st...Natalino Busa
Banks are innovating. The purpose of this innovation is to transform bank services into meaningful and frictionless customer experiences. A key element in order to achieve that ambitious goal is by providing well tailored and reactive APIs and provide them as the building blocks for greater and smoother customer journeys and experiences. For these API’s to work, internal processes have to evolve as well from batch processing to real time event processing.
In this talk, after providing a brief introduction of the streaming computing landscape, we describe a RESTful API called “Coral” meant to design and deploy customized and flexible data flows. The user can compose data flow for a number of data streaming goals such as on-the-fly data clustering and classifiers, streaming analytics, per-event predictive analysis , real time recommenders. Once the events are processed, Coral passes the resulting analysis as auctionable events for alerting, messaging or further processing to other systems. Coral is a flexible and generic event processing platform to transform streaming data into actionable events via a RESTful API. Those data flows are defined via the Web API by connecting together basic streaming processing elements named “coral actors”. The Coral framework manages those coral actors on a distributed and scalable architecture.
Streaming and real time data processing and analytics are the key elements to an improved customer experience. In this way, you can get the most targeted processing for your domain (marketing customization, personalized recommenders, fraud detection, real time security alerting, etc.). This streaming “data flow” model implies processing customers’ events as soon as they enter via web APIs. This approach borrows a lot from distributed “data flow” concepts developed for processor architectures back in the 80’s. The “Coral” streaming processing engine is generic and built on top of world class libraries such as Akka and Spark, and fully exposed via a RESTful web API.
How to Rebuild an End-to-End ML Pipeline with Databricks and Upwork with Than...Databricks
Upwork has the biggest closed-loop online dataset of jobs and job seekers in labor history (>10M Profiles, >100M Job Posts, Job Proposals and Hiring Decisions, >10B of Messages, Transaction and Feedback Data). Besides sheer quantity, our data is also contextually very rich. We have client and contractor data for the entire job-funnel – from finding jobs to getting the job done.
For various machine learning applications including search and recommendations and labor marketplace optimization (rate, supply and demand), we heavily relied on a Greenplum-based data warehouse solution for data processing and ad-hoc ML pipelines (weka, scikit-learn, R) for offline model development and online model scoring.
In this talk, we present our modernization efforts in moving towards a 1) holistic data processing infrastructure for batch and stream data processing using S3, Kinesis, Spark and Spark Structured Streaming 2) model development using Spark MLlib and other ML libraries for Spark 3) model serving using Databricks Model Scoring, Scoring over Structured Streams and microservices and 3) how we orchestrate and streamline all these processes using Apache Airflow and a CI/CD workflow customized to our Data Science product engineering needs. The focus of this talk is on how we were able to leverage the Databricks service offering to reduce DevOps overhead and costs, complete the entire modernization with moderate efforts and adopt a collaborative notebook-based solution for all our data scientists to develop model, reuse features and share results. We will shared the core lessons learned and pitfalls we encountered during this journey.
Billions of Rows, Millions of Insights, Right NowRob Winters
Presentation from Tableau Customer Conference 2013 on building a real time reporting/analytics platform. Topics discussed include definitions of big data and real time, technology choices and rationale, use cases for real time big data, architecture, and pitfalls to avoid.
Towards Real-Time banking API's: Introducing Coral, a web api for realtime st...Natalino Busa
Banks are innovating. The purpose of this innovation is to transform bank services into meaningful and frictionless customer experiences. A key element in order to achieve that ambitious goal is by providing well tailored and reactive APIs and provide them as the building blocks for greater and smoother customer journeys and experiences. For these API’s to work, internal processes have to evolve as well from batch processing to real time event processing.
In this talk, after providing a brief introduction of the streaming computing landscape, we describe a RESTful API called “Coral” meant to design and deploy customized and flexible data flows. The user can compose data flow for a number of data streaming goals such as on-the-fly data clustering and classifiers, streaming analytics, per-event predictive analysis , real time recommenders. Once the events are processed, Coral passes the resulting analysis as auctionable events for alerting, messaging or further processing to other systems. Coral is a flexible and generic event processing platform to transform streaming data into actionable events via a RESTful API. Those data flows are defined via the Web API by connecting together basic streaming processing elements named “coral actors”. The Coral framework manages those coral actors on a distributed and scalable architecture.
Streaming and real time data processing and analytics are the key elements to an improved customer experience. In this way, you can get the most targeted processing for your domain (marketing customization, personalized recommenders, fraud detection, real time security alerting, etc.). This streaming “data flow” model implies processing customers’ events as soon as they enter via web APIs. This approach borrows a lot from distributed “data flow” concepts developed for processor architectures back in the 80’s. The “Coral” streaming processing engine is generic and built on top of world class libraries such as Akka and Spark, and fully exposed via a RESTful web API.
How to Rebuild an End-to-End ML Pipeline with Databricks and Upwork with Than...Databricks
Upwork has the biggest closed-loop online dataset of jobs and job seekers in labor history (>10M Profiles, >100M Job Posts, Job Proposals and Hiring Decisions, >10B of Messages, Transaction and Feedback Data). Besides sheer quantity, our data is also contextually very rich. We have client and contractor data for the entire job-funnel – from finding jobs to getting the job done.
For various machine learning applications including search and recommendations and labor marketplace optimization (rate, supply and demand), we heavily relied on a Greenplum-based data warehouse solution for data processing and ad-hoc ML pipelines (weka, scikit-learn, R) for offline model development and online model scoring.
In this talk, we present our modernization efforts in moving towards a 1) holistic data processing infrastructure for batch and stream data processing using S3, Kinesis, Spark and Spark Structured Streaming 2) model development using Spark MLlib and other ML libraries for Spark 3) model serving using Databricks Model Scoring, Scoring over Structured Streams and microservices and 3) how we orchestrate and streamline all these processes using Apache Airflow and a CI/CD workflow customized to our Data Science product engineering needs. The focus of this talk is on how we were able to leverage the Databricks service offering to reduce DevOps overhead and costs, complete the entire modernization with moderate efforts and adopt a collaborative notebook-based solution for all our data scientists to develop model, reuse features and share results. We will shared the core lessons learned and pitfalls we encountered during this journey.
Zillow's favorite big data & machine learning toolsnjstevens
This talk covers Zillow's favorite tools for keeping track of research, cluster computing, machine learning open source, workflow management, logging, deep learning and data storage
EAP - Accelerating behavorial analytics at PayPal using HadoopDataWorks Summit
PayPal today generates massive amounts of data?from clickstream logs to transactions and routine business events. Analyzing customer behavior across this data can be a daunting task. Data Technology team at PayPal has built a configurable engine, Event Analytics Pipeline (EAP), using Hadoop to ingest and process massive amounts of customer interaction data, match business-defined behavioral patterns, and generate entities and interactions matching those patterns. The pipeline is an ecosystem of components built using HDFS, HBase, a data catalog, and seamless connectivity to enterprise data stores. EAP?s data definition, data processing, and behavioral analysis can be adapted to many business needs. Leveraging Hadoop to address the problems of size and scale, EAP promotes agility by abstracting the complexities of big-data technologies using a set of tools and metadata that allow end users to control the behavioral-centric processing of data. EAP abstracts the massive data stored on HDFS as business objects, e.g., customer and page impression events, allowing analysts to easily extract patterns of events across billions of rows of data. The rules system built using HBase allows analysts to define relationships between entities and extrapolate them across disparate data sources to truly explore the universe of customer interaction and behaviors through a single lens.
A series of tweets I posted about my 11hr struggle to make a cup of tea with my WiFi kettle ended-up going viral, got picked-up by the national and then international press, and led to thousands of retweets, comments and references in the media. In this session we’ll take the data I recorded on this Twitter activity over the period and use Oracle Big Data Graph and Spatial to understand what caused the breakout and the tweet going viral, who were the key influencers and connectors, and how the tweet spread over time and over geography from my original series of posts in Hove, England.
DataStax & O'Reilly Media: Large Scale Data Analytics with Spark and Cassandr...DataStax Academy
In this in-depth workshop you will gain hands on experience with using Spark and Cassandra inside the DataStax Enterprise Platform. The focus of the workshop will be working through data analytics exercises to understand the major developer developer considerations. You will also gain an understanding of the internals behind the integration that allow for large scale data loading and analysis. It will also review some of the major machine learning libraries in Spark as an example of data analysis.
The workshop will start with a review the basics of how Spark and Cassandra are integrated. Then we will work through a series of exercises that will show how to perform large scale Data Analytics with Spark and Cassandra. A major part of the workshop will be to understand effective data modeling techniques in Cassandra that allow for fast parallel loading of the data into Spark to perform large scale analytics on that data. The exercises will also look at how to how to use the open source Spark Notebook to run interactive data analytics with the DataStax Enterprise Platform.
At Zillow, we calculate a Zestimate® home value for about 100 million homes nationwide daily. But between batch runs, users could update their home facts or even list their home on the market. Housing markets move fast, and we want Zestimates to reflect the latest state of our housing data. In this talk, I will present the architecture of the Zestimate and the infrastructure powering it. Inspired by Lambda Architecture, the Zestimate relies on both a near real-time and a batch component. I will highlight how the design allows us to be nimble in the face of data changes, while not sacrificing algorithmic accuracy during daily batch runs.
Lambda architecture for real time big dataTrieu Nguyen
Lambda Architecture in Real-time Big Data Project
Concepts & Techniques “Thinking with Lambda”
Case study in some real projects
Why lambda architecture is correct solution for big data?
Activate 2019 - Search and relevance at scale for online classifiedsRoger Rafanell Mas
A high performing search service implies both having an effective search infrastructure and high search relevance.
Seeking for a fault tolerant, self-healing and cost-effective search infrastructure at scale, we built a platform based on Apache Solr search engine with light in-memory indexes, avoiding sharding and decreasing the overall infrastructure needs.
To populate the indexes, we use flexible ETL processes, keeping our product catalog and search indexes updated in a near real-time fashion and distributed across high-performant database engines.
We aim at getting a high search relevance precision and recall by applying query relaxation and boost solutions on top of the optimised platform.
https://www.activate-conf.com/speakers/detail/roger-rafanell
Slides from Michelle Ufford's Data Warehousing talk at Hadoop Summit 2015.
How can we take advantage of the veritable treasure trove of data stored in Hadoop to augment our traditional data warehouses? In this session, Michelle will share her experience with migrating GoDaddy’s data warehouse to Hadoop. She’ll explore how GoDaddy has adapted traditional data warehousing methodologies to work with Hadoop and will share example ETL patterns used by her team. Topics will also include how the integration of structured and unstructured data has exposed new insights, the resulting business impact, and tips for making your own Hadoop migration project more successful.
Recording available here: https://www.youtube.com/watch?v=0AxoB-wJcZc
Watching Pigs Fly with the Netflix Hadoop Toolkit (Hadoop Summit 2013)Jeff Magnusson
Overview of the data platform as a service architecture at Netflix. We examine the tools and services built around the Netflix Hadoop platform that are designed to make access to big data at Netflix easy, efficient, and self-service for our users.
From the perspective of a user of the platform, we walk through how various services in the architecture can be used to build a recommendation engine. Sting, a tool for fast in memory aggregation and data visualization, and Lipstick, our workflow visualization and monitoring tool for Apache Pig, are discussed in depth. Lipstick is now part of Netflix OSS - clone it on github, or learn more from our techblog post: http://techblog.netflix.com/2013/06/introducing-lipstick-on-apache-pig.html.
We present our solution for building an AI Architecture that provides engineering teams the ability to leverage data to drive insight and help our customers solve their problems. We started with siloed data, entities that were described differently by each product, different formats, complicated security and access schemes, data spread over numerous locations and systems.
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...Shirshanka Das
So, you finally have a data ecosystem with Kafka and Hadoop both deployed and operating correctly at scale. Congratulations. Are you done? Far from it.
As the birthplace of Kafka and an early adopter of Hadoop, LinkedIn has 13 years of combined experience using Kafka and Hadoop at scale to run a data-driven company. Both Kafka and Hadoop are flexible, scalable infrastructure pieces, but using these technologies without a clear idea of what the higher-level data ecosystem should be is perilous. Shirshanka Das and Yael Garten share best practices around data models and formats, choosing the right level of granularity of Kafka topics and Hadoop tables, and moving data efficiently and correctly between Kafka and Hadoop and explore a data abstraction layer, Dali, that can help you to process data seamlessly across Kafka and Hadoop.
Beyond pure technology, Shirshanka and Yael outline the three components of a great data culture and ecosystem and explain how to create maintainable data contracts between data producers and data consumers (like data scientists and data analysts) and how to standardize data effectively in a growing organization to enable (and not slow down) innovation and agility. They then look to the future, envisioning a world where you can successfully deploy a data abstraction of views on Hadoop data, like a data API as a protective and enabling shield. Along the way, Shirshanka and Yael discuss observations on how to enable teams to be good data citizens in producing, consuming, and owning datasets and offer an overview of LinkedIn’s governance model: the tools, process and teams that ensure that its data ecosystem can handle change and sustain #datasciencehappiness.
Processing 19 billion messages in real time and NOT dying in the processJampp
Here is an introduction in the Jampp architecture for data processing. We walk through our journey of migrating to systems that allows us to process more data in real time
Presentation by Skip Falatko & Thomas Hood to the XBRL US Conference in Nashville.
Maryland Association of CPAs case study on how we standardized our financial data down to the transaction level using XBRL GL. We were able to streamline our internal financial reporting, creating drill down financial statements and automating export to our KPI Analysis system across two different systems. The result? faster data for management decision making, increased accuracy, and freeing up finance team for more analysis and less checking of data.
XBRL and the MACPA - Summit PresentationThomas Hood
Slides of a presentation given by myself and Skip Falatko of the MACPA. This presentation covers what XBRL is in broad terms as well as an in-depth look at how the MACPA is using XBRL to improve internal efficiencies as well as extend the U.S GAAP FR Taxonomy.
Zillow's favorite big data & machine learning toolsnjstevens
This talk covers Zillow's favorite tools for keeping track of research, cluster computing, machine learning open source, workflow management, logging, deep learning and data storage
EAP - Accelerating behavorial analytics at PayPal using HadoopDataWorks Summit
PayPal today generates massive amounts of data?from clickstream logs to transactions and routine business events. Analyzing customer behavior across this data can be a daunting task. Data Technology team at PayPal has built a configurable engine, Event Analytics Pipeline (EAP), using Hadoop to ingest and process massive amounts of customer interaction data, match business-defined behavioral patterns, and generate entities and interactions matching those patterns. The pipeline is an ecosystem of components built using HDFS, HBase, a data catalog, and seamless connectivity to enterprise data stores. EAP?s data definition, data processing, and behavioral analysis can be adapted to many business needs. Leveraging Hadoop to address the problems of size and scale, EAP promotes agility by abstracting the complexities of big-data technologies using a set of tools and metadata that allow end users to control the behavioral-centric processing of data. EAP abstracts the massive data stored on HDFS as business objects, e.g., customer and page impression events, allowing analysts to easily extract patterns of events across billions of rows of data. The rules system built using HBase allows analysts to define relationships between entities and extrapolate them across disparate data sources to truly explore the universe of customer interaction and behaviors through a single lens.
A series of tweets I posted about my 11hr struggle to make a cup of tea with my WiFi kettle ended-up going viral, got picked-up by the national and then international press, and led to thousands of retweets, comments and references in the media. In this session we’ll take the data I recorded on this Twitter activity over the period and use Oracle Big Data Graph and Spatial to understand what caused the breakout and the tweet going viral, who were the key influencers and connectors, and how the tweet spread over time and over geography from my original series of posts in Hove, England.
DataStax & O'Reilly Media: Large Scale Data Analytics with Spark and Cassandr...DataStax Academy
In this in-depth workshop you will gain hands on experience with using Spark and Cassandra inside the DataStax Enterprise Platform. The focus of the workshop will be working through data analytics exercises to understand the major developer developer considerations. You will also gain an understanding of the internals behind the integration that allow for large scale data loading and analysis. It will also review some of the major machine learning libraries in Spark as an example of data analysis.
The workshop will start with a review the basics of how Spark and Cassandra are integrated. Then we will work through a series of exercises that will show how to perform large scale Data Analytics with Spark and Cassandra. A major part of the workshop will be to understand effective data modeling techniques in Cassandra that allow for fast parallel loading of the data into Spark to perform large scale analytics on that data. The exercises will also look at how to how to use the open source Spark Notebook to run interactive data analytics with the DataStax Enterprise Platform.
At Zillow, we calculate a Zestimate® home value for about 100 million homes nationwide daily. But between batch runs, users could update their home facts or even list their home on the market. Housing markets move fast, and we want Zestimates to reflect the latest state of our housing data. In this talk, I will present the architecture of the Zestimate and the infrastructure powering it. Inspired by Lambda Architecture, the Zestimate relies on both a near real-time and a batch component. I will highlight how the design allows us to be nimble in the face of data changes, while not sacrificing algorithmic accuracy during daily batch runs.
Lambda architecture for real time big dataTrieu Nguyen
Lambda Architecture in Real-time Big Data Project
Concepts & Techniques “Thinking with Lambda”
Case study in some real projects
Why lambda architecture is correct solution for big data?
Activate 2019 - Search and relevance at scale for online classifiedsRoger Rafanell Mas
A high performing search service implies both having an effective search infrastructure and high search relevance.
Seeking for a fault tolerant, self-healing and cost-effective search infrastructure at scale, we built a platform based on Apache Solr search engine with light in-memory indexes, avoiding sharding and decreasing the overall infrastructure needs.
To populate the indexes, we use flexible ETL processes, keeping our product catalog and search indexes updated in a near real-time fashion and distributed across high-performant database engines.
We aim at getting a high search relevance precision and recall by applying query relaxation and boost solutions on top of the optimised platform.
https://www.activate-conf.com/speakers/detail/roger-rafanell
Slides from Michelle Ufford's Data Warehousing talk at Hadoop Summit 2015.
How can we take advantage of the veritable treasure trove of data stored in Hadoop to augment our traditional data warehouses? In this session, Michelle will share her experience with migrating GoDaddy’s data warehouse to Hadoop. She’ll explore how GoDaddy has adapted traditional data warehousing methodologies to work with Hadoop and will share example ETL patterns used by her team. Topics will also include how the integration of structured and unstructured data has exposed new insights, the resulting business impact, and tips for making your own Hadoop migration project more successful.
Recording available here: https://www.youtube.com/watch?v=0AxoB-wJcZc
Watching Pigs Fly with the Netflix Hadoop Toolkit (Hadoop Summit 2013)Jeff Magnusson
Overview of the data platform as a service architecture at Netflix. We examine the tools and services built around the Netflix Hadoop platform that are designed to make access to big data at Netflix easy, efficient, and self-service for our users.
From the perspective of a user of the platform, we walk through how various services in the architecture can be used to build a recommendation engine. Sting, a tool for fast in memory aggregation and data visualization, and Lipstick, our workflow visualization and monitoring tool for Apache Pig, are discussed in depth. Lipstick is now part of Netflix OSS - clone it on github, or learn more from our techblog post: http://techblog.netflix.com/2013/06/introducing-lipstick-on-apache-pig.html.
We present our solution for building an AI Architecture that provides engineering teams the ability to leverage data to drive insight and help our customers solve their problems. We started with siloed data, entities that were described differently by each product, different formats, complicated security and access schemes, data spread over numerous locations and systems.
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...Shirshanka Das
So, you finally have a data ecosystem with Kafka and Hadoop both deployed and operating correctly at scale. Congratulations. Are you done? Far from it.
As the birthplace of Kafka and an early adopter of Hadoop, LinkedIn has 13 years of combined experience using Kafka and Hadoop at scale to run a data-driven company. Both Kafka and Hadoop are flexible, scalable infrastructure pieces, but using these technologies without a clear idea of what the higher-level data ecosystem should be is perilous. Shirshanka Das and Yael Garten share best practices around data models and formats, choosing the right level of granularity of Kafka topics and Hadoop tables, and moving data efficiently and correctly between Kafka and Hadoop and explore a data abstraction layer, Dali, that can help you to process data seamlessly across Kafka and Hadoop.
Beyond pure technology, Shirshanka and Yael outline the three components of a great data culture and ecosystem and explain how to create maintainable data contracts between data producers and data consumers (like data scientists and data analysts) and how to standardize data effectively in a growing organization to enable (and not slow down) innovation and agility. They then look to the future, envisioning a world where you can successfully deploy a data abstraction of views on Hadoop data, like a data API as a protective and enabling shield. Along the way, Shirshanka and Yael discuss observations on how to enable teams to be good data citizens in producing, consuming, and owning datasets and offer an overview of LinkedIn’s governance model: the tools, process and teams that ensure that its data ecosystem can handle change and sustain #datasciencehappiness.
Processing 19 billion messages in real time and NOT dying in the processJampp
Here is an introduction in the Jampp architecture for data processing. We walk through our journey of migrating to systems that allows us to process more data in real time
Presentation by Skip Falatko & Thomas Hood to the XBRL US Conference in Nashville.
Maryland Association of CPAs case study on how we standardized our financial data down to the transaction level using XBRL GL. We were able to streamline our internal financial reporting, creating drill down financial statements and automating export to our KPI Analysis system across two different systems. The result? faster data for management decision making, increased accuracy, and freeing up finance team for more analysis and less checking of data.
XBRL and the MACPA - Summit PresentationThomas Hood
Slides of a presentation given by myself and Skip Falatko of the MACPA. This presentation covers what XBRL is in broad terms as well as an in-depth look at how the MACPA is using XBRL to improve internal efficiencies as well as extend the U.S GAAP FR Taxonomy.
What is XBRL? Is it just another compliance burden forced on public companies or a powerful new technology that can change the way financial information is produced and distributed?
The Maryland Association of CPAs recently set out to see for oursleves and this presentation includes our findings. We have seen efficiencies in accessing our financial data, moving it into KPI software, and believe it can be used to populate tax filings, produce on-line financial statements and much more.
We are also joined by Eric E. Cohen, PWC and one of the founders of XBRL for a global XBRL update including the SEC mandate and SBR projects around the world where XBRL is being used to dramatically reduce teh reporting and compliance burden by businesses to governments.
We do think it is a powerful new technology that can reshape the financial reporting supply chain much like barcodes transformed the mfg and distribution supply chains.
This presentation was produced by Skip Falatko, Tom Hood IV (Student Intern), and Eric E Cohen (PWC)
Assessing New Databases– Translytical Use CasesDATAVERSITY
Organizations run their day-in-and-day-out businesses with transactional applications and databases. On the other hand, organizations glean insights and make critical decisions using analytical databases and business intelligence tools.
The transactional workloads are relegated to database engines designed and tuned for transactional high throughput. Meanwhile, the big data generated by all the transactions require analytics platforms to load, store, and analyze volumes of data at high speed, providing timely insights to businesses.
Thus, in conventional information architectures, this requires two different database architectures and platforms: online transactional processing (OLTP) platforms to handle transactional workloads and online analytical processing (OLAP) engines to perform analytics and reporting.
Today, a particular focus and interest of operational analytics includes streaming data ingest and analysis in real time. Some refer to operational analytics as hybrid transaction/analytical processing (HTAP), translytical, or hybrid operational analytic processing (HOAP). We’ll address if this model is a way to create efficiencies in our environments.
A three hour lecture I gave at the Jyväskylä Summer School. The talk goes through important details about the use of data science in real businesses. These include data deployment, data processing, practical issues with data solutions and arising trends in data science.
See also Part 1 of the lecture: Introduction Data Science. You can find it in my profile (click the face)
Agile Data Rationalization for Operational IntelligenceInside Analysis
The Briefing Room with Eric Kavanagh and Phasic Systems
Live Webcast Mar. 26, 2013
The complexity of today's information architectures creates a wide range of challenges for executives trying to get a strategic view of their current operations. The data and context locked in operational systems often get diluted during the normalization processes of data warehousing and other types of analytic solutions. And the ultimate goal of seeing the big picture gets derailed by a basic inability to reconcile disparate organizational views of key information assets and rules.
Register for this episode of The Briefing Room to learn from Bloor Group CEO Eric Kavanagh, who will explain how a tightly controlled methodology can be combined with modern NoSQL technology to resolve both process and system complexities, thus enabling a much richer, more interconnected information landscape. Kavanagh will be briefed by Geoffrey Malafsky of Phasic Systems who will share his company's tested methodology for capturing and managing the business and process logic that run today's data-driven organizations. He'll demonstrate how a “don't say no” approach to entity definitions can dissolve previously intractable disagreements, opening the door to clear, verifiable operational intelligence.
Visit: http://www.insideanalysis.com
Slides used for the keynote at the even Big Data & Data Science http://eventos.citius.usc.es/bigdata/
Some slides are borrowed from random hadoop/big data presentations
Enterprise Data World 2018 - Building Cloud Self-Service Analytical SolutionDmitry Anoshin
This session will cover building the modern Data Warehouse by migration from the traditional DW platform into the cloud, using Amazon Redshift and Cloud ETL Matillion in order to provide Self-Service BI for the business audience. This topic will cover the technical migration path of DW with PL/SQL ETL to the Amazon Redshift via Matillion ETL, with a detailed comparison of modern ETL tools. Moreover, this talk will be focusing on working backward through the process, i.e. starting from the business audience and their needs that drive changes in the old DW. Finally, this talk will cover the idea of self-service BI, and the author will share a step-by-step plan for building an efficient self-service environment using modern BI platform Tableau.
Data Lake, Virtual Database, or Data Hub - How to Choose?DATAVERSITY
Data integration is just plain hard and there is no magic bullet. That said, three new data integration techniques do ameliorate the misery, making silo-busting possible, if not trivial. The three approaches – data lakes, virtual databases (aka federated databases), and data hubs – are a boon to organizations big enough to have separate systems, separate lines of business, and redundant acquired or COTS data stores. Each approach has its place, but how do you make the right decision about which data silo integration approach to choose and when?
This webinar describes how you can use the key concepts of data Movement, Harmonization, and Indexing to determine what you are giving up or investing in, and make the best decision for your project.
Streamlining Your SAP S/4HANA Migration: Expert Tips and Best PracticesPrecisely
What is the current status of your migration plans for implementing SAP S/4HANA? While many companies are eager to move to SAP S/4HANA, others struggle to complete an effective transition.
In order to help, watch this on-demand webinar and discover how to address some of the challenges through our expert tips and best practices for how your organization can use process automation and data management solutions to not only streamline your migration but get the best initial data quality into SAP S/4HANA and establish a solid foundation for future success.
We cover:
· The why and how behind automation and its role in significantly increasing the chance of a smooth transition
· The key questions to ask before, during, and after migration to help set up success
· How process automation and data management capabilities from Precisely streamline the overall migration process
· Real examples of companies who successfully automated their transition and all of the new benefits they uncovered along the way
Whether SAP S/4HANA is the first time you are implementing SAP, or you are migrating from an older SAP system, you’ll come away with valuable information to help set-up your strategy, identify the critical questions to ask and consider, as you plot your migration strategy.
Similar to MACPA Case Study @ XBRL US - Falatko & Hood (20)
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
3. What
Intrigued
us?
• Global
Ledger
• BoJom
up
approach
• Free
the
data
• Faster
informaHon
4. Why
is
this
important
to
CPAs?
• It
will
change
how
they
work!
• Less
compiling
• More
analysis
and
interpretaHon
5. IniHal
work
on
XBRL
• Mapping
was
a
challenge
• Non-‐Profits
need
their
own
taxonomy
• XBRL
would
be
beneficial
in
the
non-‐
profit
community
6. Recent
XBRL
work…...
• Tagged
at
the
transacHon
level
• Tagged
our
membership
database
&
Dynamics
• Pulled
from
both
to
populate
reports
• Drill
down
capability
• Faster
informaHon
7. Global
Ledger
Savings?
• Difficult
to
quanHfy
• More
detail
• Faster
• AutomaHon
• Deeper
informaHon
8. Technical
Overview
• Map
accounHng
data
to
XBRL
Global
Ledger
Taxonomy
• Use
Dynamics
and
Am.Net
• Use
Excel
data
import
for
KPI
analysis,
other
apps
• UHlize
Global
Ledger
data
to
automate
internal
financial
reporHng
process
–
KPIs,
Audit,
Freedom
of
Data
9. MACPA s
AccounHng
System
• Membership
database
–
• Microsoc
Dynamics
AM.Net
– Accruals
– A/R
– Budgets
– A/P
– FRX
Reports
– Event
Data
– No
Accruals
accounted
for
in
reports
10. Disconnects/Fix
• AccounHng
Systems
• XBRL
Global
Ledger!
Don t
talk
to
each
other
• AlternaHve
soluHon:
• Staff
require
different
– Give
Dynamics
and
Am.Net
data
sets
an
UlHmatum:
Stop
ignoring
each
other
or
we
• Are
the
numbers
right
–
are
switching
to
manual
process
response
QuickBooks
unHl
they
start
behaving.
11. Mapping
• Altova
Mapforce
• IdenHfy
correct
informaHon
–
tables
in
DB
• Use
SQL
to
retrieve
relevant
data
• AccounHng
Data
to
XBRL
GL
Taxonomy
• Process
XBRL
GL
Data
using
SQL
• Create
batch
files
to
update
instance
documents
23. Global
Ledger
Benefits?
• Faster
movement
of
data
• Reduced
manual
effort
• ReducHon
of
errors
• Comparability
across
other
organizaHons
• Deeper
analysis
– TransacHon
level
detail
availability
– Annual,
Quarterly,
Monthly
for
past
11
years
of
data
24. Where
are
we
going?
• Non-‐profit
taxonomy
• Form
990
populaHon
• SBR
in
Maryland?
• Direct
Cash
Flow
Statement
-‐
With
drill
down
• Salesforce.com
• Sharing
data