The document discusses mPulse's Data Science Workbench product. It collects over 85 billion beacons per week from customers and loads the data into Amazon Redshift. This setup removes the need for data scientists to spend time preparing and wrangling data. The Data Science Workbench provides an interactive interface based on Julia programming language to explore and analyze the data. It comes with functions and models to help customers generate insights from their real user data.
3. Big Data Challenges
Data Scientists spend too much time ‘data wrangling’
“Data scientists, according to interviews and expert
estimates, spend from 50 percent to 80 percent of their
time mired in this more mundane labor of collecting and
preparing unruly digital data, before it can be explored for
useful nuggets.”
NY Times – August 17th, 2014
4. Big Data Challenges
Building a data science platform is very difficult
Infrastructure
•Choosing big data technologies and setting up a cluster can easily take 9
months or more
Data Pipeline
•Building a high performing big data schema requires specialized skills
•Extracting, transforming, and loading of data (data wrangling) is an
enormous time sink and a poor use of data scientists time
Analysis and Workflow
•Figuring out how you can ask questions of the data and how to visualize the
results takes time that data scientists should be using to generate actionable
insights from their studies
6. Julia Language & iJulia
Notebook UI
Julia is a rising star in scientific programming
processing speed
support for parallel processing
compatibility with 400+ prebuilt statistical packages
large number and growing number of visualization libraries.
Trade-Offs
7. Why Julia?
R vs Python vs Julia
Modern compiler technology
Data Connectivity
Package Ecosystem
Functional Programming Construct
Integration with Python, C, C++, R, …
11. Data Science WorkbenchData Science without the data wrangling, and much more
Infrastructure
Data Pipeline
Analysis and Workflow
• Data Science Workbench comes with the
state-of-the-art technology you need to
analyze your customer experiences
• All of the real user beacon data is loaded into
Data Science Workbench into a highly
optimized schema ready for analysis
• Data science is done with Julia, a remarkably
fast and in-memory solution for analyzing huge
data-sets
• Access to an ever growing library of analysis
functions and visualizations based on
SOASTA’s and our customers’ expertise
16. procedure Traffic is
type Airplane_ID is range 1..10; -- 10
airplanes
task type Airplane (ID: Airplane_ID); -- task
representing airplanes, with ID as initialisation
parameter
type Airplane_Access is access Airplane; --
reference type to Airplane
protected type Runway is -- the shared
runway (protected to allow concurrent access)
entry Assign_Aircraft (ID: Airplane_ID); -- all entries
are guaranteed mutually exclusive
entry Cleared_Runway (ID: Airplane_ID);
entry Wait_For_Clear;
private
Clear: Boolean := True; -- protected
private data - generally more than just a flag...
end Runway;
type Runway_Access is access all Runway;
Trivia Time!
@DanBoutinSOASTA
1983
1995