20 липня відбувся вебінар від Java Community – “Zaloni’s Architecture for Data-Driven Design” by Максим Дем’яновський — Software Engineer, GlobalLogic.
Доповідь надасть уявлення про Data-Driven Design, основні його переваги і практичну користь, а також покаже як його можна реалізувати на практиці.
3. Main stages of information evolution
1. The first revolution is associated with the invention of writing, which led to a giant qualitative and quantitative leap. It
became possible to transfer knowledge from generation to generation
2. The second (mid-16th century) was caused by the invention of printing, which radically changed industrial society,
culture, and the organization of activities
3. The third (the end of the 19th century) was caused by the invention of electricity, thanks to which the telegraph, the
telephone, and the radio appeared, allowing the rapid transmission and accumulation of information in any volume
4. The fourth (Information explosion) (70s of XX century) is the invention of microprocessor technology and the
appearance of the personal computer. Computers, computer networks, data transmission systems (information
communications) are created on microprocessors and integrated circuits
3
4. You have to realize that for instance the amount of
information produced by humanity before 2003 year is less
than the amount of data produced by one day in 2023
And you have to realize how much data is produced by end
of 2022: 97 zettabytes
By the end of 2022, there were 94 zettabytes of data in the
world. (Source: Bernard Marr & Co.) 1 ZB is the equivalent of
1,000 exabytes.
Do you know how much 181 zettabytes is? Let’s put it this
way: If you ever tried downloading it by yourself, it’d take you
about two billion years!
The amount of data produced by humanity
4
5. Data usage facts
● A single person generates 1.7 MB of data every second
● Facebook generates 4 PB of data daily
● One person generates 49.8 GB of IP traffic every month
● YouTubers upload 500 hours per minute means 30,000 hours of content every hour
● Video traffic makes up 82% of all consumer internet traffic
● 50% of all data will be in the cloud by 2025
● Every day created no less than 2.5 quintillion bytes! (That’s two exabytes plus 500 petabytes.)
● AWS Snowmobile has a capacity up to 100 petabytes
5
6. Data is not only numbers
We can see that we have a lot of data and garbage in
that data, by them self it does not have any sense.
And to make it became a useful information we have to
clean that data (fixing or removing incorrect, corrupted,
incorrectly formatted, duplicate, or incomplete data
within a dataset), and perform statistics for cleaned
data.
And when we will have structured information draw
conclusions for measures. And make that process
continuously help to reach incredible goals.
6
7. ● Help you make better AND smarter decisions
● Keep your business up-to-date
● Improved financial management
● Better performance & more efficient internal operations
● Creates a data-driven culture
● Better customer service
Why data is so important?
7
8. How companies use data to make decisions
Using Data To Create New Blockbuster Hit Series
They intelligently utilized the power of their data to run predictive analyses to learn what
exactly their customers would be receptive to and interested to watch.
Providing Faster & More Efficient Ride With Data
The company is able to analyze historical data and key metrics that include the number of
ride requests and trips getting fulfilled in different parts of a city as well as the time when this
is happening. This helps to gain insight into areas that have a supply crunch, allowing them
to pre-emptively inform drivers to move to areas ahead of time in order to capitalize on the
inevitable rise in demand.
Uses geographic information systems to analyze factors such as demographic
information, and traffic flow information to choose the best locations to expand into. Not only
does it help with choosing locations but it optimizes which product would best sell in
a given area. 8
9. Who makes decisions?
● Medical diagnosis
● Legal matters
● Human resources
● Ethical decision-making
● Creative industries
● Fraud detection
● Customer service
● Trading and investment
● Route management systems
● Advertising decisions
9
11. High level of component diagram
● Web and Mobile apps
● Services
● Devices and IoT
● Logs and Metrics
● Apache Spark
● Google BigQuery
● AWS Athena
● Azure Data Factory
● Data Lake
● Data Warehouse
● Databases
● Files
● Tableau
● Power BI
● Analysts
● 3th party services
Producer Storage Data Processing Analize
11
12. Future-proofing data lake stack
● Data collection and integration: allow for the collection and
integration of various types of data from different sources
● Real-time data processing: enable real-time data processing
● Data analysis: allow for the analysis of large amounts of data.
● Scalability: Data lakes can scale to meet the needs of the business.
● Efficiency: Data lakes allow for the efficient use of existing
resources, reducing costs associated with data processing and
storage
● Ease of use: Data lakes provide quick and easy access to data,
allowing users to retrieve information easily and quickly
12
13. Zaloni Data Lake architecture
● Understanding industry best practices
● Providing a template for solutioning
● Tracking a process
● Understanding structures and elements
13
15. ● Can be complex to implement and may require specialized expertise
● Architecture may be overkill for smaller organizations or those with limited data needs
● May not be well-suited for organizations that require real-time or near-real-time data processing
● Architecture may not be easily customizable to fit specific business needs or use cases
Pros and cons of Zaloni architecture
● Intuitively clear
● Access to raw and formatted data
● Flexible and scalable architecture that can accommodate different data types, formats, and sources
● Offers a modular and extensible architecture that can be customized to meet the specific needs
15
16. ● Lambda Architecture
● Kappa Architecture
● Data Mesh Architecture
● Virtualized Data Architecture
Alternative approaches
16
17. Summary
● Data is important for businesses because it can help inform decision-making, improve
operational efficiency, and identify new business opportunities
● Real-life examples of data-driven decisions include optimizing website design, improving app
usability, and informing product development
● Data storage options vary, and a data lake is a suitable choice when dealing with diverse and
unstructured data from multiple sources. It provides flexibility and agility for storing
and analyzing data
● Zaloni Data Lake architectures help to build Flexible and scalable architecture
17