This talk will provide overview of big data software engineering and software engineering for big data as the tow fields need integrated. The interplay between the two field of research applications of Data Science and Software Engineering will enhance future perspective for a safe, secure, and sustainable approaches to data science and application of data science for 50 years of software engineering data that exists.
Agile Big Data Analytics Development: An Architecture-Centric ApproachSoftServe
Presented at The Hawaii International Conference on System Sciences by Hong-Mei Chen and Rick Kazman (University of Hawaii), Serge Haziyev (SoftServe).
Denodo DataFest 2016: What’s New in Denodo Platform – Demo and RoadmapDenodo
Watch the full session: Denodo DataFest 2016 sessions: https://goo.gl/ptGwp7
Curious about product roadmap? In this session, we will review some of the new key features introduced this year in the Denodo Platform in areas such as performance, self-service, security and monitoring. We will also take a sneak peek at the most exciting features in the roadmap for Denodo 7.0.
In this session, you will learn:
• New performance-related features in big data scenarios
• New governance and self-service features
• New connectivity, data transformation, and enterprise-wide deployment features
This session is part of the Denodo DataFest 2016 event. You can also watch more Denodo DataFest sessions on demand here: https://goo.gl/VXb6M6
This is a Powerpoint Presentation based on the comparison of various available analytical tools. This includes various tools for business analytics and their detailed description.
Recently, in the fields Business Intelligence and Data Management, everybody is talking about data science, machine learning, predictive analytics and many other “clever” terms with promises to turn your data into gold. In this slides, we present the big picture of data science and machine learning. First, we define the context for data mining from BI perspective, and try to clarify various buzzwords in this field. Then we give an overview of the machine learning paradigms. After that, we are going to discuss - at a high level - the various data mining tasks, techniques and applications. Next, we will have a quick tour through the Knowledge Discovery Process. Screenshots from demos will be shown, and finally we conclude with some takeaway points.
Designing Fast Data Architecture for Big Data using Logical Data Warehouse a...Denodo
Companies such as Autodesk are fast replacing the once-true- and-tried physical data warehouses with logical data warehouses/ data lakes. Why? Because they are able to accomplish the same results in 1/6 th of the time and with 1/4 th of the resources.
In this webinar, Autodesk’s Platform Lead, Kurt Jackson,, will describe how they designed a modern fast data architecture as a single unified logical data warehouse/ data lake using data virtualization and contemporary big data analytics like Spark.
Logical data warehouse / data lake is a virtual abstraction layer over the physical data warehouse, big data repositories, cloud, and other enterprise applications. It unifies both structured and unstructured data in real-time to power analytical and operational use cases.
Agile Big Data Analytics Development: An Architecture-Centric ApproachSoftServe
Presented at The Hawaii International Conference on System Sciences by Hong-Mei Chen and Rick Kazman (University of Hawaii), Serge Haziyev (SoftServe).
Denodo DataFest 2016: What’s New in Denodo Platform – Demo and RoadmapDenodo
Watch the full session: Denodo DataFest 2016 sessions: https://goo.gl/ptGwp7
Curious about product roadmap? In this session, we will review some of the new key features introduced this year in the Denodo Platform in areas such as performance, self-service, security and monitoring. We will also take a sneak peek at the most exciting features in the roadmap for Denodo 7.0.
In this session, you will learn:
• New performance-related features in big data scenarios
• New governance and self-service features
• New connectivity, data transformation, and enterprise-wide deployment features
This session is part of the Denodo DataFest 2016 event. You can also watch more Denodo DataFest sessions on demand here: https://goo.gl/VXb6M6
This is a Powerpoint Presentation based on the comparison of various available analytical tools. This includes various tools for business analytics and their detailed description.
Recently, in the fields Business Intelligence and Data Management, everybody is talking about data science, machine learning, predictive analytics and many other “clever” terms with promises to turn your data into gold. In this slides, we present the big picture of data science and machine learning. First, we define the context for data mining from BI perspective, and try to clarify various buzzwords in this field. Then we give an overview of the machine learning paradigms. After that, we are going to discuss - at a high level - the various data mining tasks, techniques and applications. Next, we will have a quick tour through the Knowledge Discovery Process. Screenshots from demos will be shown, and finally we conclude with some takeaway points.
Designing Fast Data Architecture for Big Data using Logical Data Warehouse a...Denodo
Companies such as Autodesk are fast replacing the once-true- and-tried physical data warehouses with logical data warehouses/ data lakes. Why? Because they are able to accomplish the same results in 1/6 th of the time and with 1/4 th of the resources.
In this webinar, Autodesk’s Platform Lead, Kurt Jackson,, will describe how they designed a modern fast data architecture as a single unified logical data warehouse/ data lake using data virtualization and contemporary big data analytics like Spark.
Logical data warehouse / data lake is a virtual abstraction layer over the physical data warehouse, big data repositories, cloud, and other enterprise applications. It unifies both structured and unstructured data in real-time to power analytical and operational use cases.
Extract business value by analyzing large volumes of multi-structured data from various sources such as databases, websites, blogs, social media, smart sensors...
SMAC - Social, Mobile, Analytics and Cloud - An overview Rajesh Menon
In this presentation, all the aspects of SMAC are covered in as much detail as possible. You will find some ideas worth sharing and also get attuned to Social, Mobile, Analytics and Cloud
Creating a Next-Generation Big Data ArchitecturePerficient, Inc.
If you’ve spent time investigating Big Data, you quickly realize that the issues surrounding Big Data are often complex to analyze and solve. The sheer volume, velocity and variety changes the way we think about data – including how enterprises approach data architecture.
Significant reduction in costs for processing, managing, and storing data, combined with the need for business agility and analytics, requires CIOs and enterprise architects to rethink their enterprise data architecture and develop a next-generation approach to solve the complexities of Big Data.
Creating the data architecture while integrating Big Data into the heart of the enterprise data architecture is a challenge. This webinar covered:
-Why Big Data capabilities must be strategically integrated into an enterprise’s data architecture
-How a next-generation architecture can be conceptualized
-The key components to a robust next generation architecture
-How to incrementally transition to a next generation data architecture
This white paper will present the opportunities laid down by
data lake and advanced analytics, as well as, the challenges
in integrating, mining and analyzing the data collected from
these sources. It goes over the important characteristics of
the data lake architecture and Data and Analytics as a
Service (DAaaS) model. It also delves into the features of a
successful data lake and its optimal designing. It goes over
data, applications, and analytics that are strung together to
speed-up the insight brewing process for industry’s
improvements with the help of a powerful architecture for
mining and analyzing unstructured data – data lake.
This article useful for anyone who want to introduce with Big Data and how oracle architecture Big Data solution using Oracle Big Data Cloud solutions .
Data Lakes are early in the Gartner hype cycle, but companies are getting value from their cloud-based data lake deployments. Break through the confusion between data lakes and data warehouses and seek out the most appropriate use cases for your big data lakes.
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...Data Con LA
Why and How has the Big Data based Enterprise Data Lake solution based on No-SQL and SQL technologies has become significantly effective in solving enterprise data challenges than its predecessor EDW which had tried and failed to solve the same problem entirely based on SQL database only.
Watch full webinar here: https://bit.ly/2xc6IO0
To solve these challenges, according to Gartner "through 2022, 60% of all organizations will implement data virtualization as one key delivery style in their data integration architecture". It is clear that data virtualization has become a driving force for companies to implement agile, real-time and flexible enterprise data architecture.
In this session we will look at the data integration challenges solved by data virtualization, the main use cases and examine why this technology is growing so fastly. You will learn:
- What data virtualization really is
- How it differs from other enterprise data integration technologies
- Why data virtualization is finding enterprise-wide deployment inside some of the largest organizations
An overview of Hadoop and Data warehouse from technologies and business viewpoints. The presentation also includes some of my personal observations and suggestions for people who want to join the field Big Data.
Incorporating the Data Lake into Your Analytic ArchitectureCaserta
Joe Caserta, President at Caserta Concepts presented at the 3rd Annual Enterprise DATAVERSITY conference. The emphasis of this year's agenda is on the key strategies and architecture necessary to create a successful, modern data analytics organization.
Joe Caserta presented Incorporating the Data Lake into Your Analytics Architecture.
For more information on the services offered by Caserta Concepts, visit out website at http://casertaconcepts.com/.
Big data analytics is the process of examining large data sets containing a variety of data types i.e., big data to uncover hidden patterns, unknown correlations, market trends, customer preferences and other useful business information. The analytical findings can lead to more effective marketing, new revenue opportunities, better customer service, improved operational efficiency, competitive advantages over rival organizations and other business benefits. Enterprises are increasingly looking to find actionable insights into their data. Many big data projects originate from the need to answer specific business questions. With the right big data analytics platforms in place, an enterprise can boost sales, increase efficiency, and improve operations, customer service and risk management. Notably, the business area getting the most attention relates to increasing efficiencies and optimizing operations. By using big data analytics you can extract only the relevant information from terabytes, petabytes and exabytes, and analyse it to transform your business decisions for the future. Becoming proactive with big data analytics isn't a one-time endeavour, it is more of a culture change – a new way of gaining ground.
Keywords: business, analytics, exabytes, efficiency, data sets
Big Data: Architecture and Performance Considerations in Logical Data LakesDenodo
This presentation explains in detail what a Data Lake Architecture looks like, how data virtualization fits into the Logical Data Lake, and goes over some performance tips. Also it includes an example demonstrating this model's performance.
This presentation is part of the Fast Data Strategy Conference, and you can watch the video here goo.gl/9Jwfu6.
The Modern Data Architecture for Predictive Analytics with Hortonworks and Re...Revolution Analytics
Hortonworks and Revolution Analytics have teamed up to bring the predictive analytics power of R to Hortonworks Data Platform.
Hadoop, being a disruptive data processing framework, has made a large impact in the data ecosystems of today. Enabling business users to translate existing skills to Hadoop is necessary to encourage the adoption and allow businesses to get value out of their Hadoop investment quickly. R, being a prolific and rapidly growing data analysis language, now has a place in the Hadoop ecosystem.
This presentation covers:
- Trends and business drivers for Hadoop
- How Hortonworks and Revolution Analytics play a role in the modern data architecture
- How you can run R natively in Hortonworks Data Platform to simply move your R-powered analytics to Hadoop
Presentation replay at:
http://www.revolutionanalytics.com/news-events/free-webinars/2013/modern-data-architecture-revolution-hortonworks/
Very basic Introduction to Big Data. Touches on what it is, characteristics, some examples of Big Data frameworks. Hadoop 2.0 example - Yarn, HDFS and Map-Reduce with Zookeeper.
QuerySurge Slide Deck for Big Data Testing WebinarRTTS
This is a slide deck from QuerySurge's Big Data Testing webinar.
Learn why Testing is pivotal to the success of your Big Data Strategy .
Learn more at www.querysurge.com
The growing variety of new data sources is pushing organizations to look for streamlined ways to manage complexities and get the most out of their data-related investments. The companies that do this correctly are realizing the power of big data for business expansion and growth.
Learn why testing your enterprise's data is pivotal for success with big data, Hadoop and NoSQL. Learn how to increase your testing speed, boost your testing coverage (up to 100%), and improve the level of quality within your data warehouse - all with one ETL testing tool.
This information is geared towards:
- Big Data & Data Warehouse Architects,
- ETL Developers
- ETL Testers, Big Data Testers
- Data Analysts
- Operations teams
- Business Intelligence (BI) Architects
- Data Management Officers & Directors
You will learn how to:
- Improve your Data Quality
- Accelerate your data testing cycles
- Reduce your costs & risks
- Provide a huge ROI (as high as 1,300%)
Extract business value by analyzing large volumes of multi-structured data from various sources such as databases, websites, blogs, social media, smart sensors...
SMAC - Social, Mobile, Analytics and Cloud - An overview Rajesh Menon
In this presentation, all the aspects of SMAC are covered in as much detail as possible. You will find some ideas worth sharing and also get attuned to Social, Mobile, Analytics and Cloud
Creating a Next-Generation Big Data ArchitecturePerficient, Inc.
If you’ve spent time investigating Big Data, you quickly realize that the issues surrounding Big Data are often complex to analyze and solve. The sheer volume, velocity and variety changes the way we think about data – including how enterprises approach data architecture.
Significant reduction in costs for processing, managing, and storing data, combined with the need for business agility and analytics, requires CIOs and enterprise architects to rethink their enterprise data architecture and develop a next-generation approach to solve the complexities of Big Data.
Creating the data architecture while integrating Big Data into the heart of the enterprise data architecture is a challenge. This webinar covered:
-Why Big Data capabilities must be strategically integrated into an enterprise’s data architecture
-How a next-generation architecture can be conceptualized
-The key components to a robust next generation architecture
-How to incrementally transition to a next generation data architecture
This white paper will present the opportunities laid down by
data lake and advanced analytics, as well as, the challenges
in integrating, mining and analyzing the data collected from
these sources. It goes over the important characteristics of
the data lake architecture and Data and Analytics as a
Service (DAaaS) model. It also delves into the features of a
successful data lake and its optimal designing. It goes over
data, applications, and analytics that are strung together to
speed-up the insight brewing process for industry’s
improvements with the help of a powerful architecture for
mining and analyzing unstructured data – data lake.
This article useful for anyone who want to introduce with Big Data and how oracle architecture Big Data solution using Oracle Big Data Cloud solutions .
Data Lakes are early in the Gartner hype cycle, but companies are getting value from their cloud-based data lake deployments. Break through the confusion between data lakes and data warehouses and seek out the most appropriate use cases for your big data lakes.
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...Data Con LA
Why and How has the Big Data based Enterprise Data Lake solution based on No-SQL and SQL technologies has become significantly effective in solving enterprise data challenges than its predecessor EDW which had tried and failed to solve the same problem entirely based on SQL database only.
Watch full webinar here: https://bit.ly/2xc6IO0
To solve these challenges, according to Gartner "through 2022, 60% of all organizations will implement data virtualization as one key delivery style in their data integration architecture". It is clear that data virtualization has become a driving force for companies to implement agile, real-time and flexible enterprise data architecture.
In this session we will look at the data integration challenges solved by data virtualization, the main use cases and examine why this technology is growing so fastly. You will learn:
- What data virtualization really is
- How it differs from other enterprise data integration technologies
- Why data virtualization is finding enterprise-wide deployment inside some of the largest organizations
An overview of Hadoop and Data warehouse from technologies and business viewpoints. The presentation also includes some of my personal observations and suggestions for people who want to join the field Big Data.
Incorporating the Data Lake into Your Analytic ArchitectureCaserta
Joe Caserta, President at Caserta Concepts presented at the 3rd Annual Enterprise DATAVERSITY conference. The emphasis of this year's agenda is on the key strategies and architecture necessary to create a successful, modern data analytics organization.
Joe Caserta presented Incorporating the Data Lake into Your Analytics Architecture.
For more information on the services offered by Caserta Concepts, visit out website at http://casertaconcepts.com/.
Big data analytics is the process of examining large data sets containing a variety of data types i.e., big data to uncover hidden patterns, unknown correlations, market trends, customer preferences and other useful business information. The analytical findings can lead to more effective marketing, new revenue opportunities, better customer service, improved operational efficiency, competitive advantages over rival organizations and other business benefits. Enterprises are increasingly looking to find actionable insights into their data. Many big data projects originate from the need to answer specific business questions. With the right big data analytics platforms in place, an enterprise can boost sales, increase efficiency, and improve operations, customer service and risk management. Notably, the business area getting the most attention relates to increasing efficiencies and optimizing operations. By using big data analytics you can extract only the relevant information from terabytes, petabytes and exabytes, and analyse it to transform your business decisions for the future. Becoming proactive with big data analytics isn't a one-time endeavour, it is more of a culture change – a new way of gaining ground.
Keywords: business, analytics, exabytes, efficiency, data sets
Big Data: Architecture and Performance Considerations in Logical Data LakesDenodo
This presentation explains in detail what a Data Lake Architecture looks like, how data virtualization fits into the Logical Data Lake, and goes over some performance tips. Also it includes an example demonstrating this model's performance.
This presentation is part of the Fast Data Strategy Conference, and you can watch the video here goo.gl/9Jwfu6.
The Modern Data Architecture for Predictive Analytics with Hortonworks and Re...Revolution Analytics
Hortonworks and Revolution Analytics have teamed up to bring the predictive analytics power of R to Hortonworks Data Platform.
Hadoop, being a disruptive data processing framework, has made a large impact in the data ecosystems of today. Enabling business users to translate existing skills to Hadoop is necessary to encourage the adoption and allow businesses to get value out of their Hadoop investment quickly. R, being a prolific and rapidly growing data analysis language, now has a place in the Hadoop ecosystem.
This presentation covers:
- Trends and business drivers for Hadoop
- How Hortonworks and Revolution Analytics play a role in the modern data architecture
- How you can run R natively in Hortonworks Data Platform to simply move your R-powered analytics to Hadoop
Presentation replay at:
http://www.revolutionanalytics.com/news-events/free-webinars/2013/modern-data-architecture-revolution-hortonworks/
Very basic Introduction to Big Data. Touches on what it is, characteristics, some examples of Big Data frameworks. Hadoop 2.0 example - Yarn, HDFS and Map-Reduce with Zookeeper.
QuerySurge Slide Deck for Big Data Testing WebinarRTTS
This is a slide deck from QuerySurge's Big Data Testing webinar.
Learn why Testing is pivotal to the success of your Big Data Strategy .
Learn more at www.querysurge.com
The growing variety of new data sources is pushing organizations to look for streamlined ways to manage complexities and get the most out of their data-related investments. The companies that do this correctly are realizing the power of big data for business expansion and growth.
Learn why testing your enterprise's data is pivotal for success with big data, Hadoop and NoSQL. Learn how to increase your testing speed, boost your testing coverage (up to 100%), and improve the level of quality within your data warehouse - all with one ETL testing tool.
This information is geared towards:
- Big Data & Data Warehouse Architects,
- ETL Developers
- ETL Testers, Big Data Testers
- Data Analysts
- Operations teams
- Business Intelligence (BI) Architects
- Data Management Officers & Directors
You will learn how to:
- Improve your Data Quality
- Accelerate your data testing cycles
- Reduce your costs & risks
- Provide a huge ROI (as high as 1,300%)
This presentation contains a broad introduction to big data and its technologies.
Big data is a term that describes the large volume of data – both structured and unstructured – that inundates a business on a day-to-day basis.
Big Data is a phrase used to mean a massive volume of both structured and unstructured data that is so large it is difficult to process using traditional database and software techniques. In most enterprise scenarios the volume of data is too big or it moves too fast or it exceeds current processing capacity.
Choosing technologies for a big data solution in the cloudJames Serra
Has your company been building data warehouses for years using SQL Server? And are you now tasked with creating or moving your data warehouse to the cloud and modernizing it to support “Big Data”? What technologies and tools should use? That is what this presentation will help you answer. First we will cover what questions to ask concerning data (type, size, frequency), reporting, performance needs, on-prem vs cloud, staff technology skills, OSS requirements, cost, and MDM needs. Then we will show you common big data architecture solutions and help you to answer questions such as: Where do I store the data? Should I use a data lake? Do I still need a cube? What about Hadoop/NoSQL? Do I need the power of MPP? Should I build a "logical data warehouse"? What is this lambda architecture? Can I use Hadoop for my DW? Finally, we’ll show some architectures of real-world customer big data solutions. Come to this session to get started down the path to making the proper technology choices in moving to the cloud.
The Data World Distilled
Understanding how the data world works in the Big Data era
I created this slide deck as a learning tool for new employees, I figured I would post it in case it can help others understand the data space.
This slide deck covers:
- Big Data
- Data Warehouses
- ETL/Data Integration
- Business Intelligence and Analytics
- Data Quality
- Data Testing
- Data Governance
It provides a brief description along with key vendors in the space.
So you got a handle on what Big Data is and how you can use it to find business value in your data. Now you need an understanding of the Microsoft products that can be used to create a Big Data solution. Microsoft has many pieces of the puzzle and in this presentation I will show how they fit together. How does Microsoft enhance and add value to Big Data? From collecting data, transforming it, storing it, to visualizing it, I will show you Microsoft’s solutions for every step of the way
Sharing a presentation highlighting some key aspects to be taken into consideration while harnessing your Digital Transformation projects as a Digital Intelligence enabler for your enterprise
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarioskcmallu
What's the origin of Big Data? What are the real life usage scenarios where Hadoop has been successfully adopted? How do you get started within your organizations?
Real-time analytics is the use of, or the capacity to use, all available enterprise data and resources when they are needed. Big data with real-time analytics consists of 3Vs: the extreme volume of data, the wide variety of types of data and the velocity at which the data must be processed.
Real-time analytics is the use of, or the capacity to use, all available enterprise data and resources when they are needed. When combined with Big Data, it delivers the 3Vs: the extreme volume of data, the wide variety of types of data and the velocity at which the data must be must processed.
Differentiate Big Data vs Data Warehouse use cases for a cloud solutionJames Serra
It can be quite challenging keeping up with the frequent updates to the Microsoft products and understanding all their use cases and how all the products fit together. In this session we will differentiate the use cases for each of the Microsoft services, explaining and demonstrating what is good and what isn't, in order for you to position, design and deliver the proper adoption use cases for each with your customers. We will cover a wide range of products such as Databricks, SQL Data Warehouse, HDInsight, Azure Data Lake Analytics, Azure Data Lake Store, Blob storage, and AAS as well as high-level concepts such as when to use a data lake. We will also review the most common reference architectures (“patterns”) witnessed in customer adoption.
Embarking on building a modern data warehouse in the cloud can be an overwhelming experience due to the sheer number of products that can be used, especially when the use cases for many products overlap others. In this talk I will cover the use cases of many of the Microsoft products that you can use when building a modern data warehouse, broken down into four areas: ingest, store, prep, and model & serve. It’s a complicated story that I will try to simplify, giving blunt opinions of when to use what products and the pros/cons of each.
Filip Panjevic is a Co-Founder and CTO at ydrive.ai - startup dealing with self-driving cars, and one of the founders of Petnica Machine Learning School.
Filip's talk will focus on the story of Petnica School, how did it start, what has changed since the beginning, how the concept of school looks right now and why is that concept good for making new data scientists. This talk will be perfect for people who consider starting their careers in the data science field!
The talk will be a broad overview and thoughts about building one of the biggest data science communities in India. I will talk about how an ecosystem is created and value delivered to each stakeholder. I will be sharing my experience of building MachineHack and AIMinds and other platforms. One of the core agendas of the talk will be how these platforms have enabled a unique data science education and learning experience in India. The platforms built help students and engineers to imagine and work towards a career in data science.
In Drazen talk, you will get a chance to listen to how Data Science Master 4.0 on Belgrade University was created, and what are the benefits of the program.
PwC's recently released Responsible AI Diagnostic surveyed around 250 senior business executives from May to June 2019. The survey says that 84% of CEOs agree that AI-based decisions need to be explainable in order to be trusted. In the past few years, Deep learning has shown remarkable results in various applications, which makes it one of the first choices for many AI use cases. However, deep learning models are hard to explain, and since the majority of CEOs expect AI solutions to be explainable, deep learning has a serious challenge. Daniel Kahneman, in his book thinking fast and slow, presented two different systems the human brain uses to form thoughts and decisions: System 1: fast, intuitive and hard to explain System 2: slow, conscious and easy to explain In this talk I will present: A) PwC Responsible AI Survey B) A proposed deep learning framework that mimics the two systems of thinking C) The recent advances in the neural symbolic learning field.
Challenges in building a churn prediction model in different industries, presented by Jelena Pekez from Comtrade System Integration. Talk is focused on real-life use-case experience.
In my talk I am going to share with the audience a practical experience of using BI solutions for steering bank credit portfolios, make data actionable, communicate and collaborate on that data with relevant stakeholders. In our case, we have aimed for a solution that can use data-models based on Claud and on-premise, easily communicate and share information within the organization and keep track of that information flow. In addition, we want our solution to support various datasets and to have the flexibility of integrating the most popular DS languages – R and Python for the convenience and flexibility of our data science team. Our solution is based on Power BI plus the use of Azure Analytical Service and R.
The talk will have 3 parts. The overview of the practical applications of the AI and ML in the FinTech industry with a short explanation of the PSD2 directive and the disruption is caused. Application of the AI/ML from the perspective of the end-user, personal financial health, financial coach, etc. The overview of the architecture, technologies, and frameworks used with practical examples from the Zuper company.
We present a recommender system for personalized financial advice, which we designed for a large Swiss private bank. The final recommendations produced by the system were delivered to the end clients through a mobile banking platform. The recommender system is based on a collaborative filtering technique and can work with changing asset features, operate with implicit ratings and react to explicit feedback that clients can give using the mobile app. Moreover, we developed and implemented an approach to provide an explanation for each recommendation in the form “As you bought A, you might like B".
This talk shall focus on making real-time pipelines using cutting edge Big Data technologies and applying ML on gathered data. The first part of the presentation shall cover importance and necessity for streaming data processing. In addition, tools that could be used in order to build a streaming pipeline shall be proposed. The second part of this talk shall focus on making machine learning models in customer support. There shall be introduced success stories covering the need for more efficient customer support, problem resolution and gained benefits.
Presentation of the first complete AI investment platform. It is based on most innovative AI methods: most advanced neural networks (ResNet/DenseNet, LSTM, GAN autoencoders) and reinforcement learning for risk control and position sizing using Alpha Zero approach. It shows how the complex AI system which covers both supervised and reinforcement learning could be successfully used to investment portfolio optimization in real-time. The architecture of the platform and used algorithms will be presented together with the workflow of machine learning. Also, the real demo of the platform will be shown.
A lot of companies make the mistake of thinking that just hiring Data Scientists will lead to increased revenue or increased profit. For a company’s investment in Data Science to be successful the Data Scientists need to work on the right problems, with the right people, and with the right tools. In this presentation, I will talk about the lessons I have learned, and mistakes made in applying Data Science in commercial settings over the last 10 years. I will highlight what processes can increase the chances of Data Science investment being successful.
The talk would be focusing on reasons and method for creating models which maximize sales price Gross Margin but still has high confidentiality that quote would be accepted by the client. Price changes are dynamic things that are impacted by many different elements like cost of input material, labor cost, transportation cost, scrap material due to different ordered quantities, etc. Besides input cost segments, output price is also impacted by different marketing campaigns (own and others), seasonality, past and future customer behavior as well as the behavior of the product we are selling.
Andjela will share the best practices that Things Solver brings when it comes to data monetization. Things Solver clients sell more customize offerings and end up with happier customers. Andjela will share machine learning modules that do just that within Coeus. Things Solver platform.
In the past few years, many businesses started do understand the potential of real-time data analytics. And many of those invested time, energy and finances to make it happen, with weaker outcomes than expected. Reasons are few for this: too ambitious plans by leadership regarding leveraging data, not enough discipline defining goals and MVP for initial use cases, a plethora of tools and vendors available who claim that can solve all the problems, etc. So, how can we get the most value with reasonable costs out of fast (real-time) data? We will try to answer this question and give actionable advice.
University of Nottingham Ningbo China The advances of 5G, sensor, and information technologies have enabled the proliferation of smart pervasive sensor networks. Rapid progress in the design of biomedical sensors, advances in the management of medical knowledge, and improvement of algorithms for decision support, are fueling a technological disruption to health monitoring. Current technologies enable personalized A3 (anyplace, anytime, anywhere) health monitoring. Continuous health monitoring enables the extension of health care into home and workplace changing the modes of traditional health care delivery. Medical grade systems require innovative solutions for system dependability, medical decision support, data management, and interpretation, beyond current fitness and wellbeing applications. We will present innovative solutions for A3 health monitoring and discuss the use of blockchain technologies, and artificial intelligence addressing technical, medical, and ethical requirements for personalized health monitoring systems.
Data Quality is essential for e-commerce and automating it can reduce a business’s daily bottlenecks and promote its competitiveness. Product similarity can help reduce duplicate content leveraging all types of product information. But dealing with mixed-type data such as product data is a rather untypical but real business case and can be challenging.
Uroš Valant has almost 20 years of experience in planning, managing and delivering of various IT projects. He has the best and richest experience in the field of business analytics, project planning and implementation, database design and the management of development teams. In the last years, his focus is the field of predictive analytics, machine learning and applying the AI solution to a practical use in different field of work.
In his talk he will present to us interactive case study of the image recognition use and AI assisted design techniques in the textile industry.
The presentation will start as an engaging lecture where I will present the motivation behind the project based on my academic research (my Oxford PhD among others). I will tell the audience just how rampant corruption is in local governance and why is it so persistent. Then I will present our remedy: full budget transparency. I will show them our search engine and how it works, and will call the participants to download the APIs and play with the data themselves.
The talk will be divided into two parts. The first one is about geospatial open data and several Copernicus services where those data can be downloaded. The second one is about Forest and Climate project, as an example of geospatial analysis. The aim of the project was to identify the most suitable area for afforestation in Serbia by using satellite and Earth observation data. The results can be found at https://sumeiklima.org/.
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Round table discussion of vector databases, unstructured data, ai, big data, real-time, robots and Milvus.
A lively discussion with NJ Gen AI Meetup Lead, Prasad and Procure.FYI's Co-Found
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfGetInData
Recently we have observed the rise of open-source Large Language Models (LLMs) that are community-driven or developed by the AI market leaders, such as Meta (Llama3), Databricks (DBRX) and Snowflake (Arctic). On the other hand, there is a growth in interest in specialized, carefully fine-tuned yet relatively small models that can efficiently assist programmers in day-to-day tasks. Finally, Retrieval-Augmented Generation (RAG) architectures have gained a lot of traction as the preferred approach for LLMs context and prompt augmentation for building conversational SQL data copilots, code copilots and chatbots.
In this presentation, we will show how we built upon these three concepts a robust Data Copilot that can help to democratize access to company data assets and boost performance of everyone working with data platforms.
Why do we need yet another (open-source ) Copilot?
How can we build one?
Architecture and evaluation
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...sameer shah
"Join us for STATATHON, a dynamic 2-day event dedicated to exploring statistical knowledge and its real-world applications. From theory to practice, participants engage in intensive learning sessions, workshops, and challenges, fostering a deeper understanding of statistical methodologies and their significance in various fields."
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeWalaa Eldin Moustafa
Dynamic policy enforcement is becoming an increasingly important topic in today’s world where data privacy and compliance is a top priority for companies, individuals, and regulators alike. In these slides, we discuss how LinkedIn implements a powerful dynamic policy enforcement engine, called ViewShift, and integrates it within its data lake. We show the query engine architecture and how catalog implementations can automatically route table resolutions to compliance-enforcing SQL views. Such views have a set of very interesting properties: (1) They are auto-generated from declarative data annotations. (2) They respect user-level consent and preferences (3) They are context-aware, encoding a different set of transformations for different use cases (4) They are portable; while the SQL logic is only implemented in one SQL dialect, it is accessible in all engines.
#SQL #Views #Privacy #Compliance #DataLake
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfEnterprise Wired
In this guide, we'll explore the key considerations and features to look for when choosing a Trusted analytics platform that meets your organization's needs and delivers actionable intelligence you can trust.
Adjusting OpenMP PageRank : SHORT REPORT / NOTESSubhajit Sahu
For massive graphs that fit in RAM, but not in GPU memory, it is possible to take
advantage of a shared memory system with multiple CPUs, each with multiple cores, to
accelerate pagerank computation. If the NUMA architecture of the system is properly taken
into account with good vertex partitioning, the speedup can be significant. To take steps in
this direction, experiments are conducted to implement pagerank in OpenMP using two
different approaches, uniform and hybrid. The uniform approach runs all primitives required
for pagerank in OpenMP mode (with multiple threads). On the other hand, the hybrid
approach runs certain primitives in sequential mode (i.e., sumAt, multiply).
1. Big Data SE vs. SE of BD Systems
Process: Business Process Driven Service Development Lifecycle (BPD-SDL)
Methods and Design Principles: service components with soalML
Reference Architecture for big data (REF4BD)
Tools (SAS, Visual Paradigm, BonitaSoft, Bizagi Studio, Tabulea, Mathematica,
Azure/ML)
SOSE4BD as a Service (SOSEaaS), BDaaS, Big Data Adoption Framework as a Service (BAaaS),
Software Engineering Analytics as a Service(SEAaaS), SE Prediction Model as a Service (SEPaaS), Bug
Prediction as a Service with Azure/ML (MLaaS), BD Metrics as a Service (BDMaaS)
Adoption Models
Evaluation & Applications
Dr Muthu Ramachandran PhD FBCS Fellow of Advance HE, MIEEE, MACM
Principal Lecturer
Software Engineering Technologies and Emerging Practices (SETEP) Research
Group Lead
School of Computing, Creative Technologies, and Engineering (CCTE)
Leeds Beckett University
Email: M.Ramachandran@leedsbeckett.ac.uk
2. Research
Motivation
• In this changing era of integrated IoT, Cloud, Real-time Big Data Stream (Social
Media, Smart Cities, Smart Living, etc.) and services are to be Robust, Agile,
Accessible and Available to its clients. For secured and guaranteed delivery of
services, every big organization is shifting their service delivery model to
Enterprise Service Bus (ESB). The following research questions are posed:
• How do we achieve Data Reuse, Reliability, Resiliency (3Rs) and Security, Accuracy
and Availability (SAA)?
• How do we engineer a systematic approach to using and reusing software
repositories, 50 years of software engineering knowledge and experience data for
developing software and software as a service paradigm (Big Data Software
Engineering)?
• How do we engineer a systematic approach with 50 years of software engineering
approach to data science and to develop big data systems and services (Software
Engineering for Big Data)?
• What are the design principles for a SOA driven reference architecture?
• What are services comprise reference architecture for big data systems?
• How to classify technologies and products/services of big data systems?
• How should software development teams integrate Data Analytics (Data
Collection, Transformation, Analytics, and Improvement) into their software
development process?
• What new roles, artifacts, and activities of BD process come into play?
• How do we integrate BD process of new roles, artifacts, and activities tie into
existing agile or DevOps process?
3. BD Concepts
• Big data is characterized by 8V’s: volume (large amounts of data), velocity
(continuously processed data in real time), variety (unstructured, semi-
structured or structured data in different formats and from multiple and
diverse sources), veracity (uncertainty and trustworthiness of data), validity
(relevance of data to the problem to solve), volatility (constant change of
input data), and value (how data and its analysis adds value).
• Big data systems are software applications that process and potentially
generate big data. Such applications receive and process data from various
diverse (usually distributed) sources, such as sensors, devices, whole
networks, social networks, mobile devices or devices in an Internet-of-
Things. They process high workloads of data and handle high requests for
data. The idea is to use large amounts of data strategically and efficiently to
provide additional intelligence.
4. 8Vs of big
data
http://www.visualcapitalist.com/internet-minute-2018/
Velocity: BD requires real-time processing at varying intervals and may include stream as well as
batch processing
Volume: BD provides a massive historical data over several time periods (years, months, weeks, days,
etc)
Variety: The BD captured may be in variety of formats (multiple file files and multi-modal data) and
may be structured and unstructured.
Veracity: The BD captured may contain unwanted data which requires extraction, transformation, and
cleaning
Value: BD may contain very highly valuable data as well as not useful and it requires skilled data
scientist to identify what to consider for analytical processing what to discard.
5. Characteristics of BD Systems Requirements
BD
Systems
8 Vs of basic
BD systems
characteristics
Data Security (Data
Security with Data Lifecycle
(Stored, Used, Transferred,
Live, & Real-time Streaming
Data), Integrate data
security attributes
(confidentiality, integrity
and availability)
Flexible DB Architecture
Framework (requiring
new No-SQL not only
SQL-based relational db
systems) and to be able
to link multiple formats.
BD Technologies include
SQL Based (Hadoop &
SPark), and No-SQL Based
(Hbase, BigTable, Casandra,
MongoDB)
Data Cleaning
Processing (ETL
process – Extract,
Clean, Transform,
& Load)
BD Analytics
for Decision
Making &
Business
Intelligence
Multiple data
formats
Multi-channel
data source
(Real-time
stream and
batch
processing)
Spark is a cluster-computing framework, which means that it
competes more with MapReduce than with the entire Hadoop
ecosystem. For example, Spark doesn't have its own distributed
filesystem, but can use HDFS. Spark SQL which integrates relational
processing with the functional programming API of Spark. Querying
data through SQL or the Hive query language is possible through
Spark SQL. Spark SQL is a Spark module for structured data
processing. It provides a programming abstraction called
DataFrames and can also act as a distributed SQL query engine.
MongoDB is a NoSQL database, whereas Hadoop is a framework for
storing & processing Big Data in a distributed environment.
MongoDB is a document oriented NoSQL database. MongoDB
stores data in flexible JSON like document format. You can easily
map the documents to your applications.
Apache Cassandra is a NoSQL database ideal for high-speed, online
transactional data, while Hadoop is a big data analytics system that
focuses on data warehousing and data lake use cases. MapReduce is
a programming paradigm for processing and handling large data
sets.
Apache HBase is a column-oriented, NoSQL database built on top of
Hadoop (HDFS, to be exact). It is an open source implementation of
Google's Bigtable.
List of ETL Tools, https://www.etltool.com/list-of-etl-tools/
BD Analytics is the application of analysis, data, and systematic
reasoning to make decisions, learn patterns of occurring to extract
knowledge for improving process and business efficiency and cost.
Analytics allows for summarising, filtering, modelling, and
experimenting. Tools and techniques include A/B testing, statistical
modelling, machine learning, deep learning, natural language
processing, and muti-linear subspace learning.
Hadoop is an open-source software framework for storing data and running applications on clusters of commodity
hardware. It provides massive storage for any kind of data, enormous processing power and the ability to handle
virtually limitless concurrent tasks or jobs. The core of Apache Hadoop consists of a storage part, known as Hadoop
Distributed File System (HDFS), and a processing part which is a MapReduce programming model. Hadoop splits files
into large blocks and distributes them across nodes in a cluster.
6. Several research challenges of software engineering for
developing big data systems
• Surveying the existing software engineering literature on applying software
engineering principles into developing and supporting big data systems
• Identifying the fields of application for big data software systems
• Investigating the software engineering knowledge areas that have seen
research related to big data systems
• Revealing the gaps in the knowledge areas that require more focus for big
data systems development
• Determining the open research challenges in each software engineering
knowledge area that need to be met
• To be sustainable, we need an approach which is systematic, business-
driven (supporting business-process and value driven), and based on
established Software Engineering practices
7. 50 Years of SE
• 60 software development methodologies
• 50 static analysis tools
• 40 software design methods
• 37 benchmark organizations
• 25 size metrics
• 20 kinds of project management tools
• 22 kinds of testing and dozens of other tool variations.
• Minimum of 3000 programming languages software consisted, even though only 100
were frequently used. New programming languages are announced every 2 weeks, and
new tools are out more than one in each month. Every 10 months new methodologies
are discovered.
• Newly emerged service computing paradigms (SOA, Web Services, Micro Services,
Cloud SE, etc.) and established SE abstractions (Objects, classes, components, service
components, virtualisation techniques, resource-oriented computing, and containers,
etc.)
8. Key Areas of Research Challenges
• Software Engineering for Big Data which can provide a systematic process
for improving the development of big data systems. The process includes
requirements gathering for BD, software architecture for BD, testing and
debugging BD systems (performance, reliability, and security) where the
logs of analysing 5V characteristics should be included, SE process for BD
which could include CMMI, and finally Managing BD projects.
• Big Data Software Engineering is an area of research which should focus
on utilising BD for the benefit of improving SE practices and to improve
software production. The typical activities should include analytics for
software engineering, data mining software repositories, visual analytics
for software engineering, and self-adaptive systems which utilises data
generated and self-learn.
9. Four Pillars of SE4BD Principles
•Data Security with Data
Lifecycle (Stored, Used,
Transferred, Live, & Real-time
Streaming Data), Integrate
data security attributes
(confidentiality, integrity and
availability) and apply Software
Security Engineering Principles
while Data Modelling for BD:
•Misuse and Abuse cases for
service requirements
•Threat Modelling
•Secure SDLC
•Service Composition
•Autonomic Computing
•Knowledge Discovery,
Extraction, & Patterns
for Reuse
•Reuse by Analytics
Patterns & Predictive
Analytics Patterns
•BDSE Metrics with Data
Points
•Data Processing Architecture for
Integrating IoT-Cloud-BD
•Support both stream & batch
processing data
•Design for data complexity
•SOSE4BD
•Agile for BDSE
•Requirements Engineering for BD
with BPMN Simulation
•Design for Data Service Reuse
•Design for Data Security
(Achieving BSI)
•SOSE4BD Framework: Design
methods based on soaML and
service components, Process,
Applications, and Evaluation
Service-
Oriented SE
for BD
Applications
Integrated
Services (IoT-
Cloud-BD)
Data Security
(Building
Security In
(BSI))
Reuse-
Oriented
Knowledge
Discovery &
Patterns
Integrated Services (IoT-Cloud-
BD)
Data Security
Reuse-Oriented Knowledge
Discovery & Patterns
Service-Oriented SE for BD
Applications
10. Software Engineering Framework for
Big Data Systems (SE4BD)
Framework: methods, process, reference architecture (REF4BD)
applications, adoption and evaluation
11. SE4BD Framework
Process: Business Process Driven Service Development Lifecycle (BPD-SDL)
Methods and Design Principles: service components with soalML
Reference Architecture for big data (REF4BD)
Tools (SAS, Visual Paradigm, BonitaSoft, Bizagi Studio, Tabulea, Mathematica, Azure/ML)
SOSE4BD as a Service (SOSEaaS), BDaaS, Big Data Adoption Framework as a Service
(BAaaS), Software Engineering Analytics as a Service(SEAaaS), SE Prediction Model as a
Service (SEPaaS), Bug Prediction as a Service with Azure/ML (MLaaS), BD Metrics as a
Service (BDMaaS)
Adoption Models
Evaluation & Applications
SE4BD is process-centric
approach which provides
process, Methods and design
principles based on service
components with soaML,
reference architecture, tools,
applications, adoption models,
and evaluation
12. Benefits of SE Approach to BD Systems
SE4BD
Improved productivity &
reuse (Reuse of 50 years of
Software development best
practices, knowledge and
experiences)
RE for BDS; Design Methods for BDS,
References Architecture for BDS; Cloud Based
Implementation and Automating BDS Services,
Test & Deployment of new BDS services
Learning from mistakes
(Repositories of Software
Engineering Data Menzies
and Zimmermann (2013)
Software Reuse to Data Reuse: Knowledge
extraction, Reusable Patterns, BD Adoption
framework, Validating Data with Simulation
Tools, SPI Approaches to Business Intelligence
& Improvement
Software Engineering of BD
Analytics with SOA
Visual Analytics (data on bugs,
vulnerabilities, software quality,
usability, reusability, faults 7
failures, process, project
management, etc.)
Software Predictive Modelling &
Analytics with Cloud Based
Machine Learning Tools
(Azure/ML)
Improved Business Intelligence
of BDS
Secure new BDS as a Service
Menzies, T and Zimmermann, T (2013) Software Analytics: So What?, IEEE Software, vol. 30, no. 4, 2013
13. Benefits of Big Data SE
BDSE
Improved productivity &
reuse (Reuse of 50 years of
Software development best
practices, knowledge and
experiences)
Learning from mistakes
(Repositories of Software
Engineering Data Menzies
and Zimmermann (2013)
Software Engineering
Analytics
Visual Analytics (data on bugs,
vulnerabilities, software quality,
usability, reusability, faults &
failures, process, project
management, etc.)
Software Predictive Modelling &
Analytics with Cloud Based
Machine Learning Tools
(Azure/ML)
Improved Product Line,
productivity, Quality, QoS,
Process & Reuse
Secure Software & Software as a
Service
Menzies, T and Zimmermann, T (2013) Software Analytics: So What?, IEEE Software, vol. 30, no. 4, 2013
15. SE Analytics &
Big Data SE
• Big data analytics, visualisation, and predictions
has been useful and very popular research and
applications in recent years. However, in the
context of application of the big data practices to
Software Engineering Analytics, there are some
key questions need to be addressed:
• How do we apply to software engineering
analytics?
• How do we collect and access SE experience
data?
• How do we apply them to decision making
for business as well as the SE practices:
process, methods, and technology?
• How do we apply them for Software Process
Improvement and Software Practices
Improvement (SPI)
• How do we apply them for software business
process improvement?
16. SE for BD Analytics
Organisation,
Best Practices
Data,
Business
Data,
Software
Usage
Experience
Data, Process
Experience
Data, SE
Practices
Data
Big Data
(Prevailing
Experience)
Analytics &
Visualisation
Predictions
Decision Making
& Smart
Marketing
Improvement
of Business &
Software
Process,
methods,
Practices,
Team Spirit,
Test
Practices,
Products,
Quality
17. SE4BD
Lifecycle SEF4CC
BD Adoption
Framework
Big Data Source RE
Big Data Transformation RE
Big Data Consumer RE
Big Data Capability RE
Big Data Security & Privacy RE
Big Data Visualisation and Analytics RE
Business Intelligence RE (Business
Improvements)
Use BPM modelling and simulation to
validate new data and business
improvement services
18. Actionable SE Analytics Tools with SE Data Repositories
• Code coverage and visualisation tools
• Project Management & Monitoring Tools
• There are other tools to support management such as PROM, Hackystat which are capable of monitoring and to
generate a statistical report of a software project. However, even after years of development, there is no major
upgrade that can help predict management decisions (Zimmermann, 2013). Most of these tools were focused mostly
more on collection rather than any critical analysis.
• Later, more tools were developed that realized the manger problems and created which focused data presentation
rather than just collecting data. Tools like, Microsoft’s Team Foundation Server and IBM’ which keeps developers
updated with modification, bugs and build results (Zimmermann, 2013). Other Project management tools such as
Automated Project Office(APO) (Jones, 2017) also helps managers to come with the probable decision for the
organization before or during the project.
• Software Repositories: As of late 2012, our Web searches show that Mozilla Firefox had 800,000 bug reports, and
platforms such as Sourceforge.net and GitHub hosted 324,000 and 11.2 million projects, respectively.
• The PROMISE repository of software engineering data has grown to more than 100 projects and is just one of more
than a dozen open source repositories that are readily available to industrial practitioners and researchers
• Jones, C 2018, Software Methodologies. [Electronic Resource] : A Quantitative Guide, n.p.: Boca Raton : CRC
Press/Taylor & Francis
• Yang, Y. et al (2018) Actionable Analytics for Software Engineering, Actionable Analytics, Guest editors Introduction to
Special Issue on Actionable Analytics for SE, IEEE Software, Jan/Feb 2018
• Menzies, T and Zimmermann, T (2013) Software Analytics: So What?,” IEEE Software, vol. 30, no. 4, 2013
• Applications: Bug Data Prediction Models, ML for SPI, and Agile Methods
19. Repositories of software engineering data
Menzies, T and Zimmermann, T (2013) Software Analytics: So What?,” IEEE Software, vol. 30, no. 4, 2013
22. Data Science Application 1:
Bug Data Prediction Models using
cloud based machine learning
https://app.box.com/s/b1g4jsp4k7f9cof90an6dg1qju0vv9uh
23. Application 2: Cloud Based
Machine Learning Tool for Agile
Method Decision Making
This tool is useful to know when making decision on choosing an Agile methods
based on project size and constraints.
https://app.box.com/s/7q8sjo0zf36qpsahoqvv5ai28forv0vn
24. Application 3: Cloud Based
Machine Learning for Software
Process Improvement
https://app.box.com/s/6gd4zgimtz11014eavu3wv2c8cc1do6h
25.
26. Process Mining and Business Intelligence with Machine
Learning
Discover
business
process
Modelling &
Simulation
Data
Analytics
Observe
performanc
e and other
key daya
Data
Preparation
& Cleaning
with Extract,
Transform,
and Load
Data (ETL
process)
Cloud Based
Machine Learning
(Azure/ML)
Predictive
Modelling
Introduce
and improve
business
changes
27. Benefits of Process Mining with Machine
Learning
• Live data from the business services (event logs and performance
data) to study the impact patterns
• Useful to study business and service patterns
• Useful to study business and service improvements
• Useful to improve change management process
• Reuse of services and data
• Improved QoS
• Predictive modelling for efficiency and QoS
29. Predictive Modelling for Service Desk Application
Occurrence of Interactions (blue) and Incidents (orange) in time
with marked Changes (black) for Configuration Item
“SBA000607” related to Service Component “WBS000263”
Suchy, J and Suchy, M (2012) Predictive Model for supporting ITIL Business Change
Management Processes, BI 2012
30. Software Engineering Framework for
Service and Cloud Computing (SEF-
SCC) for Developing Data Science
Applications
SEF-SCC Framework Poster,
https://app.box.com/s/u5fcktx687fy6qv2nhgzp93e5n9i7r8p
31. SEF-SCC: Service-Security-Reuse – A Three Integrated
Service Engineering Framework
31
ServiceDevelopment
•Accuracy, Correctness & QoS
Design Principles
•Service RE with BPMN and
Simulation
•Service Design with Service
Components (SoaML)
•Service Development (any
platform)
•Service Testing & Deployment
& Continuous Delivery
SoftwareSecurity
Engineering
•Building Security In (BSI),
Resiliency, Fault-Tolerance
Design Principles
•Service Security RE with
Misuse & Abuse Use cases for
all identified services
•Threat Modelling
•Design for Security
•Building Security In &
Resiliency, Fault-tolerance,
•Software security testing
ServiceReuseEngineering
•Design for Reuse & Design
with Reuse, Composable,
Scalable Design Principles
•Reuse RE (Commonality &
Variability Analysis of secured
requirements on selected
BPMN & Secured use cases)
•Design for reuse approaches
•Reuse Development
(Implementing composable
services)
•Testing for reuse, composition
& integration testing strategies
32. Software
Engineering
Lifecycle for
Service and
Cloud Computing
(SEL4SCC)
32
Service Requirements
•Initial process models: Actors/roles/Workflows
•Detailed workflows
•Service Task modelling
•UI prototyping
•Process Simulation:
•Configure Resources need for tasks
•Load profiles in sec/min/days/no.of instances
•Start the Process Simulation as a Service (PSSaaS)
SOA Requirements with use case modelling, story cards, (Agile), Story Boards, CRC Cards,
Feature-Oriented modelling
SOA Design with Service Component Models
SOA Implementation with SOAP/RESTful
SOA Test & Deliver
33. Cloud service security development process with Building Security In
(BSI) – Our Systematic Approach to developing secure services
Requirements Engineering
for cloud applications
(BPMN, Use Case, RE Doc,
etc)
Conduct
BPM
Identify
SLAs
Design
service
components
Test &
Deploy
Cloud service security
requirements (misuse &
Abuse Cases, Threat
Modelling, Software
Security RE Doc)
Cloud business
process security
vulnerabilities
SLAs
security
rules and
risks
Cloud service
security
specific
components &
interfaces
Cloud
service
security
testing
Build-In Security (BSI) – Cloud service development with build-in
security
Cloud service
development
34. RE
for
BD
Arruda D., and Madhavji, N.H. (2017) Towards a Big Data Requirements Engineering Artefact Model in the Context of Big Data Software Development Projects,
2017 IEEE International Conference on Big Data (BIGDATA)
36. SEF4BD Reference Architecture for Big Data: Secure Service-
Oriented SE for BD
36
BD Source & Storage
Layer: Data Integration &
Storage: IoT-Cloud-Data
BD Processing Layer: BD
Extraction-Processing-
Analysis-Knowledge
Discovery-Analytics-
Visualizations Services
BD Applications &
Prediction Layer: BD
Service and Knowledge &
Reuse Integration and
Patterns, Machine Learning
Services
BD Application Services
Data Extraction as a Service (DEaaS)
Data Processing as a Service (DPaaS)
Data Analysis as a Service (DAaaS)
Knowledge Discovery as a Service
(KDaaS)
Knowledge Patterns as a Service (KPaaS)
Reuse Patterns as a Service (RPaaS)
Data Streaming as a Service
(DSaaS): Multi Sources: IoT,
Sensors, Mobile, Cloud, Web
Services, etc.
Data Batch Processing as a
Service (DBaaS) – depending
on the nature of application
Data Clustering,
Allocation, Cataloguing &
Pruning as a Service
Multi-Cloud as a Service
(MCaaS)
Data Transformation as a
Service (DTaaS)
Data Analytics as a Service
(DAtaaS)
Data Visualisation as a
Service (DVaaS)
Data Integration,
Data API, Data
Analysis as a
Service (DIAAaaS)
BD
Enterprise
Service
Bus (BD-
ESB)
Pääkkönen, P and Pakkala, D (2015) Reference Architecture and Classification of Technologies, Products and Services for Big Data
Systems, Big Data Research, Elsevier, http://dx.doi.org/10.1016/j.bdr.2015.01.001
Machine & Deep
Learning as a
service (MDLaaS)
Data Processing
Security as a
Service (DPSaaS)
Data Storage
Security as
a Service
(DSSaaS)
Data Application
Security as a
Service (DASaaS)
37. Mapping Services to SOA Design
Task oriented
services
Utility
Business
Coordination
Entity oriented
services
Utility
Business
Coordination
Enterprise
services (B2B)
Utility
Business
Coordination
37
As an Architect, you will need to
categorise services therefore you
will be able to place them in the
appropriate architecture layers
on the right
40. Facebook Real-time streaming services
Chen, G. J et al. (2016) Real-time Data Processing at Facebook, ACM SIGMOD 2016 San Francisco, CA USA
Many companies have developed their own systems:
examples include Twitter's Storm [28] and Heron [20],
Google's Millwheel [9], and LinkedIn's Samza [4].
Facebook's used its own tools known as Puma, Swift,
and Stylus stream processing systems.
Facebook has identified important design decisions:
performance, fault tolerance, scalability, and
correctness.
An example streaming application
with 4 nodes: this application
computes trending" events.
44. Results, Analysis, and Conclusion
• The results shows number of times a particular business service used to
process that data, and time taken.
• In addition, BPMN 2.0 also shows a number of time each resource have
been used such as API, Data Scientist (Human Tasks in BPMN), Data
Repository, Servers, Firewall, Data Storage, etc.
• The results shows by implementing Facebook types of big data processing
into REF4BD is more secure and uses resources efficiently than suing non-
standard architectures. The efficiency result shows about 95% use of
automated processing by API and Data Application (Service Components)
services.
• Compared to Facebook streaming application which uses more filters
which has extra-overheads and resources required whereas REF4BD is
more predictable, and can achieve correctness, fault-tolerance, and
scalability since it is standardised across all data process applications and
services.
45. Summary and Questions
• SOA has emerged based on established software design principles of find-
request-service paradigm suitable for service-oriented applications such as big
data processing and analytics. Therefore, it is time to consider systematic and
engineering approach to developing and deploying big data services as the
data-driven applications and devices increasing rapidly.
• In this context, this paper proposed a software engineering framework and a
reference architecture which is SOA based for big data applications’
development. This paper also concluded with a simulation of a complex big
data Facebook application with real-time streaming using BPMN simulation to
study the characteristics before big data service design, development, and
deployment. The simulation results demonstrated the efficiency and
effectiveness of developing big data applications using the reference
architecture framework for big data.
• To be sustainable, we need an approach which is systematic, business-driven
(supporting business-process and value driven), and based on established
Software Engineering practices
46. Gortan, I., Bener, A., and Mockus, A (2016) Software Engineering for Big Data Systems, Special Issue, IEEE Software, March/April 2016
Chen, G. J (2016) Real-time Data Processing at Facebook, ACM SIGMOD 2016 San Francisco, CA USA
SAKET NAVLAKHA AND ZIV BAR-JOSEPH (2015) Distributed Information Processing in Biological and Computational Systems, COMMUNICATIONS OF THE ACM | JANUARY 2015 | VOL. 58 | NO. 1
Big data of complex networks / edited by Matthias Dehmer, Frank Emmert-Streib, Stefan Pickl, Andreas Holzinger
Alessandro Fontana, Borys Wrobel (2013) Evolution and development of complex computational systems using the paradigm of metabolic computing in Epigenetic Tracking, Wivace 2013 - Italian Workshop on Artificial Life and
Evolutionary Computation
Sommerville, I (2016) Software Engineering, 10th edition, Pearson
BCS (2004) The Challenges of Complex IT Projects, The report of a working group from The Royal Academy of Engineering and The British Computer Society
Ramachandran, M (2008) Software Components: Guidelines and Applications, Nova Science Publications
Ramachandran, M (2012) Software Security Engineering, Nova Science Publications
Ng, I et al (Eds) (2011) Complex Engineering Service Systems: Concepts and Research, Springer, London
Zanetti, S.M (2013) A COMPLEX SYSTEMS APPROACH TO SOFTWARE ENGINEERING, DSc thesis, ETH ZURICH
Jin, X., et al (2015) Significance and Challenges of Big Data Research, Big Data Research (2015) 59–64 http://dx.doi.org/10.1016/j.bdr.2015.01.006
Caldarelli, G and Vespignani, A (eds) (2007) Large Scale Structure and Dynamics of Complex Networks from information technology to finance and natural science, World Scientific Publishing Co. Pte. Ltd.
Cao, L.B (2017) Data Science: Challenges and Directions, COMMUNICATIONS OF THE ACM, 60 (8), August
Cao, L.B (2015). Metasynthetic Computing and Engineering of Complex Systems. Springer-Verlag, London, U.K.
Cady, F (2017) Data Science Handbook, Wiley
Wolfgang Karl Härdle, Henry Horng Lu, Xiaotong Shen, Shen Xiaotong eds. (2017) Handbook of Big Data Analytics
Bühlmann, Peter, et al. eds. (2017) Handbook of big data, CRC/Chapman & Hall
Albert Y. Zomaya, Sherif Sakr eds. (2017) Handbook of Big Data Technologies, Springer
Bernard Marr (2017) Data Strategy: How to Profit from a World of Big Data, Analytics and the Internet of Things, Kogan
Mohammed M. Alani, Hissam Tawfik, Mohammed Saeed, Obinna Anya (2018) Applications of Big Data Analytics: Trends, issues, & Challenges, Springer
Deng, Julia, Savas, Onur (2017) eds. Big Data Analytics in Cybersecurity, CRC
Tajunisha N., Sruthika P. (2016) Handbook On Big Data Analytics, Amazon Digital Services LLC, 2016
References