A 3-day interactive workshop for startups involve in Big Data & Analytics in Asia. Introduction to Big Data & Analytics concepts, and case studies in R Programming, Excel, Web APIs, and many more.
DOI: 10.13140/RG.2.2.10638.36162
A Seminar Presentation on Big Data for Students.
Big data refers to a process that is used when traditional data mining and handling techniques cannot uncover the insights and meaning of the underlying data. Data that is unstructured or time sensitive or simply very large cannot be processed by relational database engines. This type of data requires a different processing approach called big data, which uses massive parallelism on readily-available hardware.
Content:
Introduction
What is Big Data?
Big Data facts
Three Characteristics of Big Data
Storing Big Data
THE STRUCTURE OF BIG DATA
WHY BIG DATA
HOW IS BIG DATA DIFFERENT?
BIG DATA SOURCES
BIG DATA ANALYTICS
TYPES OF TOOLS USED IN BIG-DATA
Application Of Big Data analytics
HOW BIG DATA IMPACTS ON IT
RISKS OF BIG DATA
BENEFITS OF BIG DATA
Future of big data
I've shown you in this ppt, the difference between Data and Big Data. How Big Data is generated, Opportunities with Big Data, Problem occurred in Big Data, solution of that problem, Big Data tools, What is Data Science & how it's related with the Big Data, Data Scientist vs Data Analyst. At last, one Real-life scenario where Big data, data scientists, and data analysts work together.
A Seminar Presentation on Big Data for Students.
Big data refers to a process that is used when traditional data mining and handling techniques cannot uncover the insights and meaning of the underlying data. Data that is unstructured or time sensitive or simply very large cannot be processed by relational database engines. This type of data requires a different processing approach called big data, which uses massive parallelism on readily-available hardware.
Content:
Introduction
What is Big Data?
Big Data facts
Three Characteristics of Big Data
Storing Big Data
THE STRUCTURE OF BIG DATA
WHY BIG DATA
HOW IS BIG DATA DIFFERENT?
BIG DATA SOURCES
BIG DATA ANALYTICS
TYPES OF TOOLS USED IN BIG-DATA
Application Of Big Data analytics
HOW BIG DATA IMPACTS ON IT
RISKS OF BIG DATA
BENEFITS OF BIG DATA
Future of big data
I've shown you in this ppt, the difference between Data and Big Data. How Big Data is generated, Opportunities with Big Data, Problem occurred in Big Data, solution of that problem, Big Data tools, What is Data Science & how it's related with the Big Data, Data Scientist vs Data Analyst. At last, one Real-life scenario where Big data, data scientists, and data analysts work together.
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...Simplilearn
This presentation about Big Data will help you understand how Big Data evolved over the years, what is Big Data, applications of Big Data, a case study on Big Data, 3 important challenges of Big Data and how Hadoop solved those challenges. The case study talks about Google File System (GFS), where you’ll learn how Google solved its problem of storing increasing user data in early 2000. We’ll also look at the history of Hadoop, its ecosystem and a brief introduction to HDFS which is a distributed file system designed to store large volumes of data and MapReduce which allows parallel processing of data. In the end, we’ll run through some basic HDFS commands and see how to perform wordcount using MapReduce. Now, let us get started and understand Big Data in detail.
Below topics are explained in this Big Data presentation for beginners:
1. Evolution of Big Data
2. Why Big Data?
3. What is Big Data?
4. Challenges of Big Data
5. Hadoop as a solution
6. MapReduce algorithm
7. Demo on HDFS and MapReduce
What is this Big Data Hadoop training course about?
The Big Data Hadoop and Spark developer course have been designed to impart in-depth knowledge of Big Data processing using Hadoop and Spark. The course is packed with real-life projects and case studies to be executed in the CloudLab.
What are the course objectives?
This course will enable you to:
1. Understand the different components of the Hadoop ecosystem such as Hadoop 2.7, Yarn, MapReduce, Pig, Hive, Impala, HBase, Sqoop, Flume, and Apache Spark
2. Understand Hadoop Distributed File System (HDFS) and YARN as well as their architecture, and learn how to work with them for storage and resource management
3. Understand MapReduce and its characteristics, and assimilate some advanced MapReduce concepts
4. Get an overview of Sqoop and Flume and describe how to ingest data using them
5. Create database and tables in Hive and Impala, understand HBase, and use Hive and Impala for partitioning
6. Understand different types of file formats, Avro Schema, using Arvo with Hive, and Sqoop and Schema evolution
7. Understand Flume, Flume architecture, sources, flume sinks, channels, and flume configurations
8. Understand HBase, its architecture, data storage, and working with HBase. You will also understand the difference between HBase and RDBMS
9. Gain a working knowledge of Pig and its components
10. Do functional programming in Spark
11. Understand resilient distribution datasets (RDD) in detail
12. Implement and build Spark applications
13. Gain an in-depth understanding of parallel processing in Spark and Spark RDD optimization techniques
14. Understand the common use-cases of Spark and the various interactive algorithms
15. Learn Spark SQL, creating, transforming, and querying Data frames
Learn more at https://www.simplilearn.com/big-data-and-analytics/big-data-and-hadoop-training
Disclaimer :
The images, company, product and service names that are used in this presentation, are for illustration purposes only. All trademarks and registered trademarks are the property of their respective owners.
Data/Image collected from various sources from Internet.
Intention was to present the big picture of Big Data & Hadoop
Big Data Ppt PowerPoint Presentation Slides SlideTeam
Big data has brought about a revolution in the field of information technology. Our content-ready big data PPT PowerPoint presentation slides shed light on the importance and relevance of large volumes of data. The data management presentation covers myriad of topics such as big data sources, market forecast, 3 Vs, technologies, workflow, data analytics process, impact, benefit, future, opportunity and challenges, and many additional slides containing graphs and charts. The biggest benefit that this big data analytics presentation template offers is that it enables you to unearth the information that can be used to shape the future of your business. Moreover, these designs can also be utilized to craft your own presentation on predictive analytics, data processing application, database, cloud computing, business intelligence, and user behavior analytics. Download big data PPT visuals which will help you make accurate business decisions. Enlighten folks on fraud with our Big Data PPt PowerPoint Presentation Slides. Convince them to be highly alert.
Big data is a term that describes the large volume of data – both structured and unstructured – that inundates a business on a day-to-day basis. But it’s not the amount of data that’s important. It’s what organizations do with the data that matters. Big data can be analyzed for insights that lead to better decisions and strategic business moves.
Very basic Introduction to Big Data. Touches on what it is, characteristics, some examples of Big Data frameworks. Hadoop 2.0 example - Yarn, HDFS and Map-Reduce with Zookeeper.
What is Big Data?
Big Data Laws
Why Big Data?
Industries using Big Data
Current process/SW in SCM
Challenges in SCM industry
How Big data can solve the problems?
Migration to Big data for an SCM industry
This presentation, by big data guru Bernard Marr, outlines in simple terms what Big Data is and how it is used today. It covers the 5 V's of Big Data as well as a number of high value use cases.
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...Simplilearn
This presentation about Big Data will help you understand how Big Data evolved over the years, what is Big Data, applications of Big Data, a case study on Big Data, 3 important challenges of Big Data and how Hadoop solved those challenges. The case study talks about Google File System (GFS), where you’ll learn how Google solved its problem of storing increasing user data in early 2000. We’ll also look at the history of Hadoop, its ecosystem and a brief introduction to HDFS which is a distributed file system designed to store large volumes of data and MapReduce which allows parallel processing of data. In the end, we’ll run through some basic HDFS commands and see how to perform wordcount using MapReduce. Now, let us get started and understand Big Data in detail.
Below topics are explained in this Big Data presentation for beginners:
1. Evolution of Big Data
2. Why Big Data?
3. What is Big Data?
4. Challenges of Big Data
5. Hadoop as a solution
6. MapReduce algorithm
7. Demo on HDFS and MapReduce
What is this Big Data Hadoop training course about?
The Big Data Hadoop and Spark developer course have been designed to impart in-depth knowledge of Big Data processing using Hadoop and Spark. The course is packed with real-life projects and case studies to be executed in the CloudLab.
What are the course objectives?
This course will enable you to:
1. Understand the different components of the Hadoop ecosystem such as Hadoop 2.7, Yarn, MapReduce, Pig, Hive, Impala, HBase, Sqoop, Flume, and Apache Spark
2. Understand Hadoop Distributed File System (HDFS) and YARN as well as their architecture, and learn how to work with them for storage and resource management
3. Understand MapReduce and its characteristics, and assimilate some advanced MapReduce concepts
4. Get an overview of Sqoop and Flume and describe how to ingest data using them
5. Create database and tables in Hive and Impala, understand HBase, and use Hive and Impala for partitioning
6. Understand different types of file formats, Avro Schema, using Arvo with Hive, and Sqoop and Schema evolution
7. Understand Flume, Flume architecture, sources, flume sinks, channels, and flume configurations
8. Understand HBase, its architecture, data storage, and working with HBase. You will also understand the difference between HBase and RDBMS
9. Gain a working knowledge of Pig and its components
10. Do functional programming in Spark
11. Understand resilient distribution datasets (RDD) in detail
12. Implement and build Spark applications
13. Gain an in-depth understanding of parallel processing in Spark and Spark RDD optimization techniques
14. Understand the common use-cases of Spark and the various interactive algorithms
15. Learn Spark SQL, creating, transforming, and querying Data frames
Learn more at https://www.simplilearn.com/big-data-and-analytics/big-data-and-hadoop-training
Disclaimer :
The images, company, product and service names that are used in this presentation, are for illustration purposes only. All trademarks and registered trademarks are the property of their respective owners.
Data/Image collected from various sources from Internet.
Intention was to present the big picture of Big Data & Hadoop
Big Data Ppt PowerPoint Presentation Slides SlideTeam
Big data has brought about a revolution in the field of information technology. Our content-ready big data PPT PowerPoint presentation slides shed light on the importance and relevance of large volumes of data. The data management presentation covers myriad of topics such as big data sources, market forecast, 3 Vs, technologies, workflow, data analytics process, impact, benefit, future, opportunity and challenges, and many additional slides containing graphs and charts. The biggest benefit that this big data analytics presentation template offers is that it enables you to unearth the information that can be used to shape the future of your business. Moreover, these designs can also be utilized to craft your own presentation on predictive analytics, data processing application, database, cloud computing, business intelligence, and user behavior analytics. Download big data PPT visuals which will help you make accurate business decisions. Enlighten folks on fraud with our Big Data PPt PowerPoint Presentation Slides. Convince them to be highly alert.
Big data is a term that describes the large volume of data – both structured and unstructured – that inundates a business on a day-to-day basis. But it’s not the amount of data that’s important. It’s what organizations do with the data that matters. Big data can be analyzed for insights that lead to better decisions and strategic business moves.
Very basic Introduction to Big Data. Touches on what it is, characteristics, some examples of Big Data frameworks. Hadoop 2.0 example - Yarn, HDFS and Map-Reduce with Zookeeper.
What is Big Data?
Big Data Laws
Why Big Data?
Industries using Big Data
Current process/SW in SCM
Challenges in SCM industry
How Big data can solve the problems?
Migration to Big data for an SCM industry
This presentation, by big data guru Bernard Marr, outlines in simple terms what Big Data is and how it is used today. It covers the 5 V's of Big Data as well as a number of high value use cases.
What is the impact of Big Data on Analytics from a Data Science perspective.
Presented at the Big Data and Analytics Summit 2014, Nasscom by Mamatha Upadhyaya.
Big Data is one of the most prominent disruptive technologies available today. The potential it offers for business is truly astounding.
But what is it? Time for a crashcourse!
How to Become a Thought Leader in Your NicheLeslie Samuel
Are bloggers thought leaders? Here are some tips on how you can become one. Provide great value, put awesome content out there on a regular basis, and help others.
How to Make Awesome SlideShares: Tips & TricksSlideShare
Turbocharge your online presence with SlideShare. We provide the best tips and tricks for succeeding on SlideShare. Get ideas for what to upload, tips for designing your deck and more.
For the supply chain leader, Big Data is a new concept. It is not one that is currently well understood. It will be overhyped and overpromised before the concepts reach mainstream adoption. However, it is here to stay. The goal of this report is to better educate and prepare the supply chain leader for this change. In this report, we define the concepts and share insights to help leaders better understand how Big Data concepts can help solve problems in today’s supply chain.
DevOps é um conjunto de princípios, métodos e tecnologias para lidar com o desafio de liberar rapidamente a evolução do software de alta qualidade desde o desenvolvimento até a produção, onde tudo se torna programável: aplicação, testes e Infraestrutura.
Lecture given at the University of Catania on December 2nd, 2014.
Start from Big Data definitions, continue with real life examples of successful Big Data Projects, go a little bit deeper with Sentiment Analysis, and conclude with a brief overview of Big Data tools and Big Data with Microsoft.
Summary:
1. What is Big Data? (includes the 5Vs of Big Data)
2. Big Data Examples (includes 6 Real Life Examples and comments on Privacy concerns)
3. How to Tackle a Big Data Problem (my 4 Universal Steps to follow)
4. Sentiment Analysis (what is sentiment analysis? Why do we care? A Technique and a plan)
5. Big Data tools (Hadoop, Hadoop Ecosystem, Hive, Pig, Sqoop, Oozie; Azure HDInsight, Excel Power Query, Power Pivot, Power View, Power Map)
Benefícios e desafios que Big Data & Analytics traz para as empresas na jorna...Flávio Secchieri Mariotti
APICON 2017 no Brasil. Obrigado pelo convite e oportunidade de compartilhar um pouco da experiência da CSC na jornada de transformação digital com ênfase em Big Data & Analytics.
This presentation introduces concepts of Big Data in a layman's language. Author does not claim the originality of the content. The presentation is made by compiling from various sources. Author does not claim copyrights or privacy issues.
Big data is exponentially rising in today's age of information and digital shrinkage. This presentation potentially clears the concept and revolving hype around it.
Big data refers to the vast amount of structured and unstructured data that inundates organizations on a daily basis. This data comes from various sources such as social media, sensors, digital transactions, mobile devices, and more.
Data analytics is important because it helps businesses
optimize their performances. Implementing it into the
business model means companies can help reduce
costs by identifying more efficient ways of doing
business and by storing large amounts of data. A
company can also use data analytics to make better
business decisions and help analyze customer trends
and satisfaction, which can lead to new—and better—
products and services
NIIT and Denodo: Business Continuity Planning in the times of the Covid-19 Pa...Denodo
Watch: https://bit.ly/349QjYr
Currently, the most common Analytical Solutions are implemented on large scalable ecosystems which involve massive Data Lakes and Data Warehouses. These solutions take time to build and incur substantial TCO. In today’s environment we need rapid technologies, and NIIT has developed a compelling solution powered by Denodo’s Data Virtualization and Data Catalog.
Why Everything You Know About bigdata Is A LieSunil Ranka
As a big data technologist, you can bet that you have heard it all: every crazy claim, myth, and outright lie about what big data is and what it isn't that you can imagine, and probably a few that you can't.If your company has a big data initiative or is considering one, you should be aware of these false statements and the reasons why they are wrong.
Big Data Update - MTI Future Tense 2014Hawyee Auyong
The Futures Group first wrote about the emerging phenomenon of Big Data in 2010 as it was about to enter the mainstream. It was envisaged that Big Data would create a demand for new skills (Google has identified statisticians as the “sexy job of the decade”) and generate new industries. This report updates on the industry value chain and business models for the data analytics industry, latest developments as well as the opportunities for Singapore.
How optimize the usage of data to driving innovation and efficiency, focused on Brazilian banking market landscape, highlighting main trends, key challenges, leverage managed data lakes and samples of use cases.
Big Data is the lastest cashcow. Data Analytics has now a crucial role for industries. This article describes as to what is Big Data and Analytics and how a Chartered Accountant will be able to provide value in this field.
Forecast to contribute £216 billion to the UK economy via business creation, efficiency and innovation, and generate 360,000 new jobs by 2020, big data is a key area for recruiters.
In this QuickView:
- Big data in numbers
- Top 10 industries hiring big data professionals
- Top 10 qualifications sought by hirers
- Top 10 database and BI skills sought by hirers
- Getting started in big data: popular big data techniques and vendors
Big Data: Smart Technologies Provide Big OpportunitiesNAED_Org
Big data has garnered big-time buzz as an effective means to optimize business and measure success. This concise report provides an introduction to the elements of big data and how smart technologies are playing a big role in the information game.
Similar to Big Data & Analytics (Conceptual and Practical Introduction) (20)
As Europe's leading economic powerhouse and the fourth-largest hashtag#economy globally, Germany stands at the forefront of innovation and industrial might. Renowned for its precision engineering and high-tech sectors, Germany's economic structure is heavily supported by a robust service industry, accounting for approximately 68% of its GDP. This economic clout and strategic geopolitical stance position Germany as a focal point in the global cyber threat landscape.
In the face of escalating global tensions, particularly those emanating from geopolitical disputes with nations like hashtag#Russia and hashtag#China, hashtag#Germany has witnessed a significant uptick in targeted cyber operations. Our analysis indicates a marked increase in hashtag#cyberattack sophistication aimed at critical infrastructure and key industrial sectors. These attacks range from ransomware campaigns to hashtag#AdvancedPersistentThreats (hashtag#APTs), threatening national security and business integrity.
🔑 Key findings include:
🔍 Increased frequency and complexity of cyber threats.
🔍 Escalation of state-sponsored and criminally motivated cyber operations.
🔍 Active dark web exchanges of malicious tools and tactics.
Our comprehensive report delves into these challenges, using a blend of open-source and proprietary data collection techniques. By monitoring activity on critical networks and analyzing attack patterns, our team provides a detailed overview of the threats facing German entities.
This report aims to equip stakeholders across public and private sectors with the knowledge to enhance their defensive strategies, reduce exposure to cyber risks, and reinforce Germany's resilience against cyber threats.
Opendatabay - Open Data Marketplace.pptxOpendatabay
Opendatabay.com unlocks the power of data for everyone. Open Data Marketplace fosters a collaborative hub for data enthusiasts to explore, share, and contribute to a vast collection of datasets.
First ever open hub for data enthusiasts to collaborate and innovate. A platform to explore, share, and contribute to a vast collection of datasets. Through robust quality control and innovative technologies like blockchain verification, opendatabay ensures the authenticity and reliability of datasets, empowering users to make data-driven decisions with confidence. Leverage cutting-edge AI technologies to enhance the data exploration, analysis, and discovery experience.
From intelligent search and recommendations to automated data productisation and quotation, Opendatabay AI-driven features streamline the data workflow. Finding the data you need shouldn't be a complex. Opendatabay simplifies the data acquisition process with an intuitive interface and robust search tools. Effortlessly explore, discover, and access the data you need, allowing you to focus on extracting valuable insights. Opendatabay breaks new ground with a dedicated, AI-generated, synthetic datasets.
Leverage these privacy-preserving datasets for training and testing AI models without compromising sensitive information. Opendatabay prioritizes transparency by providing detailed metadata, provenance information, and usage guidelines for each dataset, ensuring users have a comprehensive understanding of the data they're working with. By leveraging a powerful combination of distributed ledger technology and rigorous third-party audits Opendatabay ensures the authenticity and reliability of every dataset. Security is at the core of Opendatabay. Marketplace implements stringent security measures, including encryption, access controls, and regular vulnerability assessments, to safeguard your data and protect your privacy.
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
3. 1
Introduction
What is data?
Data
is a set of values of qualitative or quantitative variables.
is any sequence of one or more symbols given meaning by
specific act(s) of interpretation. [In Computing].
Yaman Hajja | Big Data & Analytics
4. 1
Introduction
What is data?
Data
is a set of values of qualitative or quantitative variables.
is any sequence of one or more symbols given meaning by
specific act(s) of interpretation. [In Computing].
Data Information
Data requires interpretation to become information.
Yaman Hajja | Big Data & Analytics
5. 2
Data is the new oil of the digital economy
Data in the 21st century is like oil in the 18th century.
Data is the new oil of
the digital economy.
Yaman Hajja | Big Data & Analytics
6. 2
Data is the new oil of the digital economy
Data in the 21st century is like oil in the 18th century.
Data is the new oil of
the digital economy.
Data infrastructure should become a profit center.
Yaman Hajja | Big Data & Analytics
7. 3
Types of data
Types of data. Translation of document hosted by João Netoat.
Yaman Hajja | Big Data & Analytics
8. 4
Open Data
Open Data
is the idea that some data should be freely available to everyone
to use and republish as they wish, without restrictions from
copyright, patents or other mechanisms of control.
Yaman Hajja | Big Data & Analytics
9. 4
Open Data
Open Data
is the idea that some data should be freely available to everyone
to use and republish as they wish, without restrictions from
copyright, patents or other mechanisms of control.
Example:
Linked Datasets as of August 2014. Tungsten Tide.
Yaman Hajja | Big Data & Analytics
10. 5
Datasets for data science projects
Example:
analyticsvidhya
Example:
kaggle
Example:
drivendata
opendatasoft
opendatainception
Yaman Hajja | Big Data & Analytics
11. 6
What is data analysis?
Data analysis: also known as data analytics, is a process of
inspecting, cleansing, transforming, and modeling data with the
goal of discovering useful information, suggesting conclusions,
and supporting decision-making.
Data analysis has multiple facets and approaches,
encompassing diverse techniques under a variety of names, in
different business, science, and social science domains.
Yaman Hajja | Big Data & Analytics
12. 7
What is data analysis?
Statistical data
Statistical data?
Statistical analysis:
is a component of data analytics. In the context of business
intelligence (BI), statistical analysis involves collecting and
scrutinizing every data sample in a set of items from which
samples can be drawn.
A sample,
in statistics, is a representative selection drawn from a total
population.
Yaman Hajja | Big Data & Analytics
16. 11
Understanding Big Data
Big Data
is a term for data sets that are so
large or complex that traditional data
processing application softwares are
inadequate to deal with them.
Challenges include capture, storage,
analysis, data curationa
, search,
sharing, transfer, visualization, querying,
updating and information privacy.
aorganization and integration of data collected from
various sources
Yaman Hajja | Big Data & Analytics
17. 12
Big Data Characteristics
3 Vs
1. Volume: big data doesn’t sample; it just observes and tracks
what happens
2. Velocity: big data is often available in real-time
3. Variety: big data draws from text, images, audio, video; plus it
completes missing pieces through data fusion
Yaman Hajja | Big Data & Analytics
18. 13
Who can deal with Big Data?
Yaman Hajja | Big Data & Analytics
19. 14
Who can deal with Big Data?
Yaman Hajja | Big Data & Analytics
23. 18
some Big Data facts
Big Data and Business Analytics Revenues Forecast to Reach
$150.8 Billion This Year, Led by Banking and Manufacturing
Investments, According to from International Data Corporation
(IDC), an increase of 12.4% over 2016.
Twenty-five years ago, data was growing at a rate of 100GB a
day. Now, data grows at a rate of almost 50,000GB a second.
The world today is awash in data. In 2015, mankind produced as
much information as was created in all previous years of human
civilization. Every time we send a message, make a call, or
complete a transaction, we leave digital traces.
Yaman Hajja | Big Data & Analytics
25. 20
Data Visualization
Data visualization is a general term that describes any effort to
help people understand the significance of data by placing it in a
visual context. Patterns, trends and correlations that might go
undetected in text-based data can be exposed and recognized
easier with data visualization software.
Yaman Hajja | Big Data & Analytics
26. 21
Example: Data Visualized
Charter value
NPLs
Exchange rate
M1
15
15.5
16
16.5
17
17.5
18
Chartervalue%
2
4
6
8
10
12
14
16
18
20
22 2002m1
2002m7
2003m1
2003m7
2004m1
2004m7
2005m1
2005m7
2006m1
2006m7
2007m1
2007m7
2008m1
2008m7
2009m1
2009m7
2010m1
2010m7
2011m1
2011m7
2012m1
2012m7
2013m1
2013m7
2014m1
2014m7
2015m1
2015m7
Time (2002 M1 - 2015 M8)
NPLs % Money supply M1 % pa
Exchange rate Charter value %
NPls of Malaysia banking system over M1, exchange rate, and charter value (2002 M1 - 2015 M8)
Yaman Hajja | Big Data & Analytics
27. 22
Example#2: Data Visualized
Capital
GDP
NPLs
-12
-10
-8
-6
-4
-2
0
2
4
6
8
10
12
14
16 1998m1
1998m7
1999m1
1999m7
2000m1
2000m7
2001m1
2001m7
2002m1
2002m7
2003m1
2003m7
2004m1
2004m7
2005m1
2005m7
2006m1
2006m7
2007m1
2007m7
2008m1
2008m7
2009m1
2009m7
2010m1
2010m7
2011m1
2011m7
2012m1
2012m7
2013m1
2013m7
2014m1
2014m7
2015m1
NPLs % GDP growth % Capital ratio %
NPLs of Malaysia banking system over business cycle (GDP) (1998 M1 - 2015 M3) with capital ratio
Yaman Hajja | Big Data & Analytics
30. 25
Social Network Analysis
Social network analysis (SNA) is the process of investigating
social structures through the use of network and graph
theories.
It characterizes networked structures in terms of nodes
(individual actors, people, or things within the network) and the
ties, edges, or links (relationships or interactions) that connect
them. Examples of social structures commonly visualized
through social network analysis include social media networks.
Yaman Hajja | Big Data & Analytics
31. 26
Example of Social Network Analysis
Data visualization of Facebook relationships
Yaman Hajja | Big Data & Analytics
33. 28
What exactly is the meaning of an API?
Application Programming Interface (API)
Application Programming
Interface (API)
API is a particular set of rules (’code’)
and specifications that software
programs can follow to communicate
with each other.
It serves as an interface between
different software programs and
facilitates their interaction, similar to the
way the user interface facilitates
interaction between humans and
computers.
Yaman Hajja | Big Data & Analytics
34. 29
What exactly is the meaning of an API?
Application Programming Interface (API)
API is a set of subroutine definitions, protocols, and tools for building
application software.
It is a set of clearly defined methods of communication between
various software components. A good API makes it easier to develop
a computer program by providing all the building blocks, which are
then put together by the programmer.
An API may be for a web-based system, operating system, database
system, computer hardware or software library. An API specification
can take many forms, but often includes specifications for routines,
data structures, object classes, variables or remote calls.
Microsoft Windows API, the C++ Standard Template Library and Java
APIs are examples of different forms of APIs.
Yaman Hajja | Big Data & Analytics
36. 31
Example of web API
Shiny Weather Data
A web API is an application programming interface (API) for
either a web server or a web browser.
Shiny Weather Data is a web service making different sources of
European gridded climate data available in hourly time series
formats used by common building performance modeling tools.
This web service has been around for a while and has a steadily
growing user group of professional building modelers as well as
students and researchers.
satellite-based time series of solar irradiation for the actual
weather conditions as well as for clear-sky conditions
Portfolio Visualizer
Yaman Hajja | Big Data & Analytics
37. 32
Predictive Analytics
Predictive analytics is the branch of
the advanced analytics which is used to
make predictions about unknown future
events.
Predictive analytics uses many
techniques from data mining, statistics,
modeling, machine learning, and
artificial intelligence to analyze current
data to make predictions about future.
Yaman Hajja | Big Data & Analytics
39. 34
Probability and Statistics
Probability is the measure of the likelihood that an event will
occur. Probability is quantified as a number between 0 and 1
(where 0 indicates impossibility and 1 indicates certainty). The
higher the probability of an event, the more certain that the event
will occur.
A simple example is the tossing of a coin. Since the coin is
unbiased, the two outcomes ("head" and "tail") are both equally
probable; the probability of "head" equals the probability of
"tail". Since no other outcomes are possible, the probability is
1/2 (or 50%), of either "head" or "tail".
Yaman Hajja | Big Data & Analytics
40. 35
Probability Theory
Probability Theory is the branch of mathematics concerned
with probability, the analysis of random phenomena.
The central objects of probability theory are random variables,
stochastic processes, and events: mathematical abstractions of
non-deterministic events or measured quantities that may either
be single occurrences or evolve over time in an apparently
random fashion.
Example
Yaman Hajja | Big Data & Analytics
41. 36
Statistics
Statistics as "a branch of mathematics dealing with the
collection, analysis, interpretation, and presentation of masses of
numerical data". Merriam-Webster dictionary.
In applying statistics to, e.g., a scientific, industrial, or social
problem, it is conventional to begin with a statistical population or
a statistical model process to be studied.
Populations can be diverse topics such as "all people living in a
country" or "every atom composing a crystal".
Statistics deals with all aspects of data including the planning of
data collection in terms of the design of surveys and
experiments.
Yaman Hajja | Big Data & Analytics
42. 37
Normal Distribution
Normal (or Gaussian) distribution is a very common continuous
probability distribution. Normal distributions are important in
statistics and are often used in the natural and social sciences to
represent real-valued random variables whose distributions are
not known.
LINK (Normal Distribution).
Yaman Hajja | Big Data & Analytics
45. 40
p-value
The P value, or calculated probability, is the probability of finding the
observed, or more extreme, results when the null hypothesis (H0) of a
study question is true – the definition of ’extreme’ depends on how
the hypothesis is being tested.
- LINK.
- Seeing Theory website.
Yaman Hajja | Big Data & Analytics
46. 41
what is Regression Analysis?
Regression analysis is a form of predictive modelling technique
which investigates the relationship between a dependent (target)
and independent variable (s) (predictor).
This technique is used for forecasting, time series modelling and
finding the causal effect relationship between the variables. For
example, relationship between rash driving and number of road
accidents by a driver is best studied through regression.
Regression analysis is an important tool for modelling and
analyzing data.
There are multiple benefits of using regression analysis.
They are as follows:
*** It indicates the significant relationships between dependent
variable and independent variable.
*** It indicates the strength of impact of multiple independent
variables on a dependent variable.
Yaman Hajja | Big Data & Analytics
47. 42
Linear Regression
It is one of the most widely known
modeling technique. Linear
regression is usually among the
first few topics which people pick
while learning predictive
modeling.
Linear Regression establishes a
relationship between dependent
variable (Y) and one or more
independent variables (X) using
a best fit straight line (also
known as regression line).
Yaman Hajja | Big Data & Analytics
48. 43
Linear Regression. Cont.
It is represented by an equation
Y = α + βX + e, where a is
intercept, β is slope of the line
and e is error term. This equation
can be used to predict the value
of target variable based on given
predictor variable(s).
Yaman Hajja | Big Data & Analytics
50. 45
Back to R Programming
How to fetch stock data?
Example: How to fetch stock data?
Financial time series forecasting – an easy approach
Yahoo Finance
Yaman Hajja | Big Data & Analytics
51. 46
Back to R Programming
R - Linear Regression
Example
Linear Regression in R.
Yaman Hajja | Big Data & Analytics
52. 47
Back to R Programming
R - Linear Regression
Example
Advanced R
Yaman Hajja | Big Data & Analytics
53. 48
Artificial intelligence (AI)
Definition
AI is intelligence exhibited by machines. In computer science,
the field of AI research defines itself as the study of "intelligent
agents": any device that perceives its environment and takes
actions that maximize its chance of success at some goal.
The term "artificial intelligence" is applied when a machine
mimics "cognitive" functions that humans associate with other
human minds, such as "learning" and "problem solving" (known
as Machine Learning).
In August 2001, robots beat humans in a simulated financial
trading competition.
Yaman Hajja | Big Data & Analytics
54. 49
Artificial intelligence (AI)
List of programming languages for artificial intelligence
Definition
Python is widely used for Artificial Intelligence. They have a lot of
different AIs with corresponding packages: General AI, Machine
Learning, Natural Language Processing and Neural Networks.
Companies like Narrative Science use Python to create an
artificial intelligence for Narrative Language Processing.
MATLAB.
C++
.
Yaman Hajja | Big Data & Analytics
55. 50
Machine learning
Definition
Machine learning is the subfield of computer science that gives
computers the ability to learn without being explicitly
programmed. Evolved from the study of pattern recognition
and computational learning theory in artificial intelligence,
machine learning explores the study and construction of
algorithms that can learn from and make predictions on
data—such algorithms overcome following strictly static program
instructions by making data driven predictions or decisions,
through building a model from sample inputs.
Machine learning is employed in a range of computing tasks
where designing and programming explicit algorithms with good
performance is difficult or infeasible; example applications
include spam filtering, optical character recognition (OCR),
search engines and computer vision.
Yaman Hajja | Big Data & Analytics
56. 51
Machine learning
Definition +
Machine learning is a branch in computer science that studies
the design of algorithms that can learn. Typical machine learning
tasks are concept learning, function learning or “predictive
modeling”, clustering and finding predictive patterns.
These tasks are learned through available data that were
observed through experiences or instructions, for example.
Machine learning hopes that including the experience into its
tasks will eventually improve the learning. The ultimate goal is to
improve the learning in such a way that it becomes automatic, so
that humans like ourselves don’t need to interfere any more.
Yaman Hajja | Big Data & Analytics
57. 52
Machine learning
Figure: The machine learning process starts with raw data and ends up with
a model derived from that data.
Yaman Hajja | Big Data & Analytics
58. 53
Common Machine Learning Algorithms
Naïve Bayes Classifier Algorithm
K Means Clustering Algorithm
Support Vector Machine Algorithm
Apriori Algorithm
Linear Regression
Logistic Regression
Artificial Neural Networks
Random Forests
Decision Trees
Nearest Neighbours (k-nearest neighbours "KNN" )
Yaman Hajja | Big Data & Analytics
59. 54
The Role of [R] in machine learning
Much of the work done by a data scientist involves statistics. For
example, machine learning algorithms commonly apply some
kind of statistical technique to prepared data.
But doing this kind of work can sometimes require programming.
What programming language is best for statistical computing?
The answer is clear: It’s the open-source language called R.
Created in New Zealand more than 20 years ago, R has
become the lingua franca for writing code in this area. In
fact, it’s hard to find a data scientist who doesn’t know R.
Example: Machine Learning in R using (k-nearest neighbours)
algorithm.
Yaman Hajja | Big Data & Analytics
61. 56
Data mining
Definition
Data mining is the computational process of discovering
patterns in large data sets involving methods at the intersection
of artificial intelligence, machine learning, statistics, and
database systems.
It is an interdisciplinary subfield of computer science
Yaman Hajja | Big Data & Analytics
62. 57
Data mining
Definition 2
Data in digital form are available everywhere. It can be used to
predict the future. Usually the statistical approach is used. Data
mining is an extension of traditional data analysis and statistical
approaches in that it incorporates analytical techniques drawn
from a range of disciplines.
Data mining covers the entire process of data analysis,
including data cleaning and preparation and visualization of the
results, and how to produce predictions in real-time so that
specific goals are met.
Source
Yaman Hajja | Big Data & Analytics
63. 58
Data mining process and concept
Figure: Data mining is actually a part of the knowledge discovery process (KDD: knowledge
discovery from data). Data mining can be considered as a step in an iterative knowledge
discovery process which is shown in the above figure (Fayyad & Patetsky-Shapiro & Smith, 1996)
Yaman Hajja | Big Data & Analytics
64. 59
Data mining in "Risk Management"
Data mining creates models through data analysis and
prediction to help solve problems involving both project feasibility
and risk management.
Data mining has been used to analyze a database containing
information on a person’s history, achievements, and expertise.
The goal was to develop a profile of the maturity of a certain
project involving the resource capacity, especially human capital.
Yaman Hajja | Big Data & Analytics
66. 61
Data mining Cont.
Why Data Mining?
It helps to discover reasons for success and failure.
It helps to understand your customers, products etc.
It improves your organization by mining large sized databases.
SQL Data Mining Algorithms
Set of clusters illustrating how to relate the cases in dataset.
Decision Tree forecasts about the outcome and its after-effects.
Set of Rules explain how to group the products in a transaction.
Yaman Hajja | Big Data & Analytics