I got to make the trip to Redmond for the Cortana Analytics Workshop on September 9th - 10th. While it was labeled a workshop, it felt like a full on conference without all the vendors. There were probably several hundred attendees, 4 fantastic keynotes, and about 40 breakout sessions/tutorials spread across the 2 day event. Attendees seemed to include a pretty good mix of data scientists and solution developers from all over the world (mostly US, but I heard 1 in 5 international). There were great opportunities to network with both workshop attendees and Microsoft product team members.
All four keynotes were really good. Joseph Sirosh led things off. I love watching him speak as he is such a good storyteller. Most of his content was familiar from previous presentations and webinars, but it still set the tone with engaging demos and videos (I still get excited about the land-speed record video, and I've probably seen it 20 times the last couple of months). While I didn't attend all of James Phillip's high energy Power BI keynote, the highlight for me was definitely the announcement of R integration with Power BI Desktop. I knew this was coming, but I didn't know it would be available as soon as next month (more to come on this once released)!
My favorite keynotes were actually the 2 given on day 2. The morning keynote was by Raghu Ramakrishnan, who heads Engineering for Big Data at Microsoft. He painted a great picture of how big data is done at Microsoft, both internal with Cosmos and Scope, and with platform services like HDInsight and Azure Data Lake. The HDInsight with Spark demo at the end was a great way to cap it off. Finally, to close out the conference, Marcus Ash, Group Program Manager for Cortana on the Windows-phones-tablets team, gave an awesome presentation on artificial intelligence, especially as it relates to the digital assistant, Cortana. I was amazed at the amount of detail that goes into the psychological components of Cortana. Everything from the location of Cortana on the taskbar in Windows 10, to the non-intrusive, initial-setup questions Cortana asks you all have a very specific purpose. She (Cortana) is built in a way so that over time, users continue to trust her more-and-more.
There were a large variety of break-out sessions at the workshop including hands-on tutorials, architectural/platform overviews, organizational leadership discussions, and real-world implementation examples. While there were a couple of duds, overall, the workshops I attended were all very engaging and spot on with the target audience. The messaging was very consistent across all of the sessions with most sessions kicking of with the image below and a message about what services were in-scope for that session.
I made an effort to try and attend break-out sessions that touched on newer platform services, real-world operationalized implementation examples, and a couple of machine learning/data science presentations. I won't go deep on the specific sessions I attended (as I provide some key take-aways below), but I did want to call out one thing I really liked about the machine learning presentations I attended that were put on by Microsoft's data science team. These presentations walked through very specific machine learning problems, as well the data science approach and Azure Machine Learning models that can be used to solve them. This is what I expected from the session excerpts, but the nice surprise was that in both sessions I attended, the data scientist giving the presentation also provided operational context to the model they built with a diagram showing the different services that could be combined to productionize the model.
My Key Take-Aways
I have several OneNote sheets full of enough content for several future posts, but I've listed some cliff-notes versions of my key take-aways from the sessions and keynotes I attended:
- Thinking in terms of Storage and Compute: The keynotes and sessions reiterated the importance of thinking in terms of storage and compute when developing with Cortana Analytics Suite services. The platform provides you a collection of different storage services (BLOB, Data Lake, SQL Data Warehouse, etc.) and different compute services (HDInsight, SQL Stored Procedures, Machine Learning APIs, etc.) that are all held together by the same Azure fabric and can be orchestrated to operate together with tools such as Azure Data Factory. This means that you can combine storage and compute functions across a wide range of services to best satisfy the specific task you are trying to accomplish. Fully understanding and buying into this concept opens a whole world of opportunity and possibility.
- Elasticity (always a fun word to say): Elasticity was another recurring message throughout the workshop. This should be no surprise, as a key selling point of cloud platforms in general is the ability to use and pay for resources as you need them. In Cortana Analytics Suite you have different storage tiers within Azure Data Lake, BLOB, and SQL Data Warehouse which correspond to different use cases and different costs. You also have many different ways to adjust compute resources like creating and then blowing away on-demand HDInsight clusters to process data, or like scaling up SQL Data Warehouse resources inside of a stored procedure and then scaling back down when the procedure is done executing. This allows you to budget for worst-case storage and compute scenarios, but to develop solutions that utilize most-efficient, as-needed resource utilization.
- Interest has been "Spark"ed: The buzz around Spark has been going for a while now, but up until now, I have sort of pushed it aside as something I'll get caught up on later. After watching a presentation and demo of Spark on HDInsight, my level of interest has definitely shifted. From what I saw in the demo, there is tremendous opportunity around use cases for highly efficient and interactive data exploration and visualization with front-end tools like Zeppelin and Power BI sitting on top of Spark. I'm thrilled to explore those and more operational use cases even further.
- Use Azure Data Catalog Now?: The Azure Data Catalog is a service that almost any company with any sort of BI and/or analytics program can start using immediately. It provides an easy-to-use, comprehensive (and still growing) set of features for cataloging, documenting, searching, and managing data assets throughout the organization. Microsoft has gone deep with functionality and integration with the company's own data assets (SQL Server, Reporting Services, Analysis Services, etc.), but it also has connectors for plenty of assets from other companies with plans to keep adding. This service can (and probably should) replace any of today's work around creating reports for surfacing metadata about different data sources.
Overall, I was pretty amped about the whole workshop, as it really reaffirmed my passion and excitement about the platform and where it's going. To paraphrase what Joseph Sirosh said in closing the conference, "There's huge opportunity here to build some really awesome, meaningful, and impactful stuff!"