lenny-kuhne-jHZ70nRk7Ns-unsplash
Data Science

A Review of the Data Science Blog Series: Lessons Learned

In June 2019, I started my journey as a newly minted blogger with the Topcoder platform. It has been a wild ride as I developed a series of posts on data science and how to do the basics with Python. In this post, I want to share some lessons I have learned in doing the series.

Here are those lessons learned:

  • Blogging about Python and data science sharpened my programming skills. Because Topcoder members and contestants rely on the blog to help them win their competitions, I was careful to not only code my examples, but give extensive documentation within those Jupyter notebooks complementing the series. I had finished a Python programming certificate at Isothermal Community College and needed a way to keep up my Python skills. So, I made an inquiry to Topcoder and now I am a blogger on the platform. The best way to apply what you have learned in your Python courses is to build applications and post them on Github. This is what I did when I developed those blog posts in 2019.

  • Second, in coding Jupyter notebooks for my series, I learned the art of frequently debugging my code. Debugging code is a key skill to learn not just for data science, but for general software development. Buggy code can cause models to predict the wrong results or just not execute at all. The worst error a programmer can make is a logic error, where your code runs correctly but outputs the wrong result. The wrong predictive model can cost an organization money, so it is in your best interest to think through the code and periodically tweak it for maximum effectiveness.

  • Third, I learned Jupyter notebooks are sometimes superior to programming IDEs for executing data science code. These notebooks, if carefully designed, can mimic a real-life scientific notebook. Data science programming is usually done with Jupyter as it is a practical environment for exploration, coding, and development of predictive models. One of the popular Python distributions is Anaconda Python, which includes Jupyter plus Spyder. By using Anaconda, programmers do not have to set up their data science programming environments as the distribution has most of the libraries needed for analysis. 

  • Finally, I learned that blogging is hard work and requires an understanding of my audience. Doing the two blogs on TensorFlow 2.0 bought this home to me as a developer/blogger. To effectively present how to work with TensorFlow, I had to read the documentation and play around with sample code. Blogging was both successful and unsuccessful at the same time. It was not successful because the examples I wanted to include in those posts did not work and I was working against a deadline. I ended up doing blog posts on the steps for TensorFlow analysis without being able to include code examples. The blog posts were a success, however, because it introduced the library to the Topcoder community. Anything that makes Topcoder a better platform for members, bloggers, and programmers is always appreciated

  • A fifth lesson I learned in blogging during 2019 was by participating in the December THRIVE challenge. While I got a post published, I struggled working on other posts due to Grammarly flagging my posts for plagiarism, even though no content was plagiarized. I truly had an appreciation for all of the members responsible for checking our posts and ensuring we put out the best content on THRIVE. That challenge showed me that I can crank out well-developed blog posts, no matter the outcome. 

I enjoyed my blogging journey with Topcoder in 2019 and am looking forward to developing content in 2020. As the 2020 blog information suggests, I will be developing content for both the general blog and THRIVE. The new approach to blogging will yield new and exciting content. In my next THRIVE  post, I want to introduce exploratory data analysis using Python, which is a step-by-step approach to analyzing data before applying data science algorithms to develop models.