In the modern world data is produced with ever increasing volume, velocity, and variety of formats. This data can be extremely valuable. It can be used to understand and track application or service behavior so that we can find errors or suboptimal user experience. We can mind it for patterns and correlations to generate recommendations. Examples can be a ecommerce sites which analyze user access logs and provide product recommendations, another examples are social networking sites or dating sites which provide new friends recommendations, or helps to find qualified soul mates, and so fourth.
Also Consumers and businesses in our days are demanding up-to-the-second (or even millisecond) analytics on their fast-moving data, in addition to classic batch processing of historical data.
We also are finding that as data creation is becoming more real-time and continuous so there is a need to manage it at high speed.
To simplify big data processing, we present it as a data bus which is comprising from the various stages such as: ingest, store or collect, process, and finally analyze and present data for visualization. The right technology in each stage based on criteria like data volume and structure as well as query latency, request rate and item size.
AWS delivers technologies to accommodate al of those processing stages. Here you can see extensive portfolio offered to deal with various aspects of Big Data. But what services should you use, why, when, and how?
And The first question is usually, How can I move my data to AWS ?
When data start moving into AWS, We can persist them into a number of storage for further analysis. Those services are Relational Database Service, object storage S3, streaming storage solution Kinesis, to key-value storage DynamoDB, also Hadoop file system on ElasticMapReduce and RedShift warehouse give you a wide range of options to persist data.
But What Data Storage Should be Use and when?
Here at Amazon, We don’t believe that there is one tool that can do everything, but rather if you use the right tools, you can build a highly configurable big data architecture to meet your specific needs.
And AWS comes with variety of services which provides customers with the right tool optimized for Data structure; Query complexity and other Data characteristics such as data frequency access patter.
Here you can see AWS service grouped in 4 classes based on data structure and complexity.
“Top two quadrant represent structured data and the bottom two represent none-structured data. At the same time, left columns groups services which provide are well optimized for simple query patterns while two groups on the right present services optimized for complex queries.”
Another way to think about big data design for optimal solution is the frequency access patter which can be visualized as data temperature
Data labeled as hot if they are very frequently accessed by customer, probably within a second or few seconds time window on the opposite side of the temperature scale we have cold data which are typically archived data with rare chance to be accessed, or can tolerate an hour delay access. Warm –usually referred to the data which access pattern from a few second to a few minutes
Other parameters such as total volume of data, item size, request rate and query latency as well as durability and cost play equally important role to build a highly configurable big data architecture and meet customer specific needs.
Usually, for hot data we are talking about small data objects within a few kb at total volume a few GB at most but usually we expect small query latency and high request rate. While we are talking about cold data then it is usually big data volumes with low request rate for the data and response time for processing within minutes if not hours.
This heat map combines the notion of data temperature with query latency and summarizes AWS solutions available in the context of the temperature of the data and the data volume, data durability, request rate, processing latency as well as pricing requirements.)
ElastiCash and DynamoDB are the good fit for hot data while RDS, CloudSearch, EMR/HDFS and S3 provides you with options for Warm And finally Glacier is the offering for cold data.
There is a certain intersections in terms of latency, request rate and data volume among ElastiCashe, DynamoDB and RDS or from DyanomoDB, RDS and HDFS.
Thus our customers always a few options to implement their solution.
Finally, To provide complete toolset for Big Data problems, AWS provide Processing Applications calls Connectors which can write to Multiple Data Stores and Processing Frameworks such as Storm, Hive, Spark, etc. which Could Read from Multiple Data Stores.
For visualization tear AWS working with many partners providing Business Intelligent platforms which can connect to AWS BigData services through standard APIs.
This is the end of my presentation and I am thank you for your time.
20 years ago, IT/OPS managed as much of the application delivery chain as possible Content was aggregated at the web server Experiences were optimized using Application Delivery Controllers – hardware appliances in your datacenter Threats were mitigated by hardware-based firewalls And load balancers ensured scalability
All of these components were under your control and had ample opportunity to accelerate and secure applications and data.
But today, content is aggregated in the browser. Consider some of the standard 3rd party components that together make up engaging, personalized experiences.
DISCO: What are some of the 3rd party components that you folks include in your apps today? Have you had challenges either adding the components you want, or ensuring an optimal experience with the components you have? What are some of the things you’ve tried to fix that? Have they worked? What has it cost you?
Not only has the aggregation point moved out to the browser, but web architectures have evolved to include more aaS solutions for your infrastructure, platform and software needs.
DISCO: What ‘aaS’ components do you use today or are you planning to include? What were your goals for using ‘aaS’ components? How has that change impacted your business?
[GETTING TO THE YOTTAA POINT] The industry is moving to a services based model – if you’ve heard of SOA (services oriented architecture) it’s the way developers prefer to build modern applications because it makes them far more efficient and capable of achieving far more. However it also changes things: Applications connect directly to the internet – they’re not managing connections and data via application delivery controllers and your firewall Moving content aggregation to the browser also means that ADCs have no access to optimize the application And neither do CDNs …because the BROWSER is requesting and rendering all of the content. ADCs and CDNs do not extend to the browser. The ADC stops at your datacenter And the CDN stops at the edge
So Yottaa has built an app optimization platform that extends from your datacenter all the way to the user’s browser. It was designed from the ground-up to work with legacy and modern cloud architectures This means that we are completely platform, infrastructure and software agnostic – we have to be able to work with any networked solutions you have in place today And, to enable developers, IT professionals, marketers and the businesses they support to remain agile and focused on the customer, we require no code change. Every Yottaa optimization is configuration-based and delivered in real-time via our cloud service.
SEGUE: the net effect is significant
We’ve been certified as a NetSuite BuiltForNetsite (SuiteApp) technology partner and proven to accelerate eCommerce sites with NO modification to NetSuite, which means we don’t require a cartridge NO limitations to other components you might use on the NetSuite platform And NO slowdowns or other dependencies because we require no code change
Rapid Prototyping for Big Data with AWS
RAPID PROTOTYPING FOR
BIG DATA WITH AWS
Tuesday, March 15, 2016
8 AM PST/4 PM BST/5 PM CEST
VP of Technology Services,
Amazon Web Services
VP of Marketing and Business
AWS as a
Case study Questions
TYPICAL BIG DATA CHALLENGES
Archives Docs Business
Velocity Variety VolumeComplexity
• Data Quality
• Fault-Tolerance and
• Skills Availability
WHY PROTOTYPING IS IMPORTANT?
Typical signs to start prototyping:
• Requirements are uncertain
• Technologies are new
• No comparable system has been previously developed
• No full buy-in from the business
They said they didn’t
need a prototype
WHEN AND WHY TO PROTOTYPE?
Find more info at: “Strategic Prototyping for Developing Big Data Systems”,
IEEE Software, March-April, 2016
• Identification of missing, conflicting or ambiguous architectural requirements
• Creation of initial architecture design and selection of candidate technologies
• Confirmation of user interface requirements and system scope
• Demonstration version of the system to obtain buy-in from the business
• Integration of selected technologies
• Clarification of complex requirements
• Testing critical functionality and quality attribute scenarios
• Validation of technologies and scenarios that pose risks
• Getting early feedback from end users and updating the product accordingly
• Presentation of a working version to a trade show or customer event
• Evaluation of team progress and alignment
AWS as a
Case study Questions
BIG DATA CHALLENGES
Big Data Real-time Big Data
SIMPLIFY BIG DATA PROCESSING
Ingest Collect Process Analyze
AWS BIG DATA TECHNOLOGIES
S3Amazon Kinesis GlacierDynamoDB
AWS Direct Connect AWS Import/Export
AWS Data Pipeline
Collect Process Analyze
BIG DATA PROCESSING
DATA CHARACTERISTICS: HOT, WARM, COLD
Hot Warm Cold
Volume MB–GB GB–TB PB
Item size B–KB KB–MB KB–TB
Latency ms ms, sec min, hrs
Durability Low–High High Very High
Request rate Very High High Low
Cost/GB $$-$ $-¢¢ ¢
WHAT DATA STORE SHOULD I USE?
Hot Warm Data Cold
YOUR BIG DATA APPLICATION ON AWS
parallel COPY from
AWS as a
Case study Questions
YOTTAA CREATES AN ABSTRACTION LAYER ON TOP OF
INFRASTRUCTURE, APP & VISITOR BROWSER
YOTTAA’S PROXY-BASED SOLUTION SEES EVERY VISITOR
REQUEST & INFRASTRUCTURE RESPONSE
REAL-TIME WEB ANALYTICS – LOB & IT USE CASES TO
DRIVE YOTTAAS BUSINESS FORWARD
• User experience
• Visitor Targeting
• Vendor Attribution
• Business Agility
IT & Operations
• Centralized log delivery & analytics
• Role-based Access Control
• Dual-factor authentication
• Account lockout
• Real-time traffic & threat analysis
• Event management
• In-line actions via Yottaa Portal
THE SOLUTION: IMPACTANALYTICSTM BIG DATA
ANALYTICS FOR ACTIONABLE INSIGHT
▪ Volume (> 100 TB scale)
▪ Throughput (> 20K/sec)
▪ Performance (low latency)
▪ Exploratory analytics
▪ Near Real-time (5 sec latency)
▪ Historical view (5 years data)
Combine different techniques
Stream (resent data) – hot data
Batch (all data) – cold and warm